I want to copy an observation to the lines above and below in variables with a specific characteristic (in this case the ID).If the variable is missing, no actions are required. I really have no clue how to do this. Any help would be greatly appreciated. Thanks in advance!
Have:
ID var
1 .
1 1
1 1
1 1
1 .
1 .
1 .
2 .
2 .
2 .
2 .
2 .
3 .
3 .
3 1
3 .
4 .
4 .
4 .
4 .
Want:
ID var
1 1
1 1
1 1
1 1
1 1
1 1
1 1
2 .
2 .
2 .
2 .
2 .
3 1
3 1
3 1
3 1
4 .
4 .
4 .
4 .
Just merge the data with itself.
data want;
merge have(drop=var) have(keep=id var where=(var=1));
by id;
run;
Another method: proc timeseries if you have SAS/ETS.
proc timeseries data=have out=want(drop=time);
by id;
var var / setmissing=maximum;
run;
But if it absolutely has to be the next value, you can run through proc timeseries twice: once to get the next value in-line and another to get the rest.
proc timeseries data=have out=have2;
by id;
var var / setmissing=next;
run;
proc timeseries data=have2 out=want(drop=time);
by id;
var var / setmissing=previous;
run;
Simplest way is SQL.
take aggregate statistic of column and assign to new value
Must select all variables so that a summary table is not generated
proc sql;
create table want as
select ID, var, max(var) as filled_var
from have
group by ID
from have;
quit;
Or in a data step.
sort by ID and descending VAR, any empty values are now at the end
Use RETAIN to hold value across rows
Drop old variable
proc sort data=have;
by id descending var;
run;
data want;
set have;
retain filled_var;
if first.id then filled_var = var;
keep id filled_var;
run;
data have;
input ID var;
datalines;
1 .
1 1
1 1
1 1
1 .
1 .
1 .
2 .
2 .
2 .
2 .
2 .
3 .
3 .
3 1
3 .
4 .
4 .
4 .
4 .
;
data want(drop = v);
do _N_ = 1 by 1 until (last.ID);
set have;
by ID;
v = max(v, var);
end;
do _N_ = 1 to _N_;
set have;
var = v;
output;
end;
run;
Related
I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;
Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;
I tried some solutions already here and I am still unable to get a desired output.
The data I have is given below (ID is unique):
data have;
input id code_1 code_2 code_3 code_4 randa randb randc$;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;;;;;
run
I need to get the frequency of only the presence of various codes. (code1, code2 etc..)
The desired output:
Variable Frequency
code_1 3
code_2 1
code_3 2
code_4 3
I tried the solution in this and the code is given below:
ods output onewayfreqs=preds;
proc freq data=have;
tables _all_;
run;
ods output close;
proc tabulate data=preds;
class table frequency;
tables table,frequency;
run;
Output:
Frequenza
1 2 3
N N N
Table 1 . 1
Tabella code_1
Tabella code_2 1 . 1
Tabella code_3 . 2 .
Tabella code_4 1 . 1
Tabella id 4 . .
Tabella randa 4 . .
Tabella randb 4 . .
Tabella randc 4 . .
Also I tried as the code below:
proc freq data=have order=freq;
array codes code_:;
do _n_ = 1 to dim(codes);
table codes(_n_)/list missing out=var1_freq;
end;
run;
But I donot know how to write the code properly.
I am getting output for the code below (only for one code at a time):
proc freq data=have order=freq ;
tables code_1/list missing out=var1_freq;
run;
But how to get for multiple codes? Many thanks for your help..!
The out= option for the tables statement will only produce output for the last variable listed, so you won't get all 4 codes.
You can count the 1 valued code_* variables after transposition.
data have;
input id code_1 code_2 code_3 code_4 randa randb randc $ ;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;
data idcodes / view=idcodes;
set have;
array codes code_1-code_4;
do _n_ = 1 to dim (codes);
variable = vname(codes(_n_));
flag = codes(_n_);
output;
end;
keep id variable flag;
run;
proc freq data=idcodes;
where flag;
table variable / out=freqs(keep=variable count);
run;
Presuming codes are only 0/1, you could also sum the codes and transpose the result.
proc means noprint data=have;
var code_:;
output out=flagsum sum=;
run;
proc transpose data=flagsum out=want(rename=(_name_=variable col1=frequency));
var code_:;
run;
I have the following dataset
data have;
input SUBJID VISIT$ PARAMN ABLF$ AVAL;
cards;
1 screen 1 . 151
1 random 1 YES .
1 visit1 1 . .
1 screen 2 . 65.5
1 random 2 YES 65
1 visit1 2 . .
1 screen 3 . .
1 random 3 YES 400
1 visit1 3 . 420
;
run;
I want to create another variable called BASE that captures the value of AVAL (when there is an actual value in place) when ABLF=YES and and then drag it down until a new PARAMN is encountered.
Basically I want the output to look like this
SUBJID VISIT$ PARAMN ABLF$ AVAL BASE;
1 screen 1 . 151 .
1 random 1 YES . .
1 visit1 1 . . .
1 screen 2 . 65.5 65
1 random 2 YES 65 65
1 visit1 2 . . 65
1 screen 3 . . 400
1 random 3 YES 400 400
1 visit1 3 . 420 400
I used the the following code
data want;
set have;
by SUBJID PARAMN;
if first.PARAMN and ABLF=' ' then BASE=.;
if ABLF='YES' then BASE=AVAL;
retain BASE;
run;
however when I run this I don't the data to look exactly as I want above
RETAIN does not look like the right tool for this. RETAIN can only move data forward in the file. It cannot move it backwards.
Looks like there is just one observation with the "BASE" value. So just merge it back onto the data.
data want;
merge have
have(keep=subjid paramn aval ablf rename=(aval=BASE ablf=xx)
where=(xx='YES'))
;
by SUBJID PARAMN;
drop xx;
run;
Pro SQL:
proc sql;
select a.*,b.aval as BASE from have a left join have(drop=visit where=(ablf='YES')) b
on a.subjid=b.subjid and a.paramn=b.paramn;
quit;
Double do loop:
data want;
do until(last.visit);
set have;
retain temp;
by subjid paramn notsorted;
if ablf='YES' then temp=aval;
end;
do until(last.visit);
set have;
by subjid paramn notsorted;
base=temp;
end;
drop temp;
run;
dataset looks like this
variable
1
.
3
.
5
.
7
.
9
How do you replace missing even values with the correct one
and resulting data should appear as
1
2
3
4
5
6
7
8
9
do you mean that the data looks like this:?
var
---
1
.
3
.
etc
and you want the ones in between to be one more than the one before? if so ...
data one;
input var;
datalines;
1
.
3
.
5
;
run;
data two (drop=prev_var);
set one;
retain prev_var;
if missing(var) then do;
var = prev_var + 1;
end;
prev_var=var;
run;
proc print data = two noobs; run;