I have a dataset with some volumes in a column and I want to create a second column that contains the average of the previous three observations. Is this possible?
e.g.
data have;
input Vol Avg_pre_4;
datalines;
228 .
141 .
125 .
101 164.66
116 122.33
107 114
74 108
118 99
127 99.67
123 106.33
;
run;
The LAG function is an automatic built-in queue.
VOL_AVG_OF_PRIOR3 = MEAN ( lag(Vol), lag2(Vol), lag3(Vol) )
if _n_ < 4 then VOL_AVG_OF_PRIOR3 = .;
Related
I have a data set and am trying to add four new variables using the existing ones. I keep getting an error that says the code is incomplete. I'm having trouble seeing where it is incomplete. How do I fix this?
data dataset;
input ID $
Height
Weight
SBP
DBP
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;
You did not end your input statement with a semicolon. input reads variables from external data (in this case, in-line data with the datalines statement). New variables are not created within input in the way you've specified.
Use input to read in the five variables of your data. After that, create new variables based on those five read-in variables:
data dataset;
input ID $
Height
Weight
SBP
DBP
;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
;
run;
Correcting 2 errors should fix this:
Add a semicolon after the last field being read in from the datalines, which is DBP.
(A previous version of this question used the ^ symbol for exponents.) Instead of ^ to raise to the power of something, use **
For reference, SAS arithmetic operators are described here.
After making the 2 corrections above I ran the revised code below without any errors.
data dataset;
input ID $
Height
Weight
SBP
DBP;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;
I am performing Principal Components Analysis in SAS Enterprise Guide and wish to compute factor/component scores on some holdout.
KeepCombinedLR is my primary source of truth. I have another dataset, with the exact same variables, that I would like to be scored without including it in the actual factor analyses.
proc factor data = KeepCombinedLR
simple
method = prin
priors = one
rotate = varimax reorder
mineigen = 1
nfactors = 25
out = FactorScores;
var var1--var40;
run;
data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse ##;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
proc factor data=Fitness outstat=FactOut
method=prin rotate=varimax score;
var Age Weight RunTime RunPulse RestPulse;
title 'Factor Scoring Example';
run;
proc print data=FactOut;
title2 'Data Set from PROC FACTOR';
run;
proc score data=Fitness score=FactOut out=FScore;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=FScore;
title2 'Data Set from PROC SCORE';
run;
PROC SCORE will score your data for you, using your 'holdout' data set.
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_score_examples01.htm&docsetVersion=14.3&locale=en
I have a dataset that looks like this
ID Model_Value Count_Model
111 24 2
222 12 9
234 88 6
111 88 8
222 24 10
222 88 17
I want it to look like this:
ID Model_12 Model_24 Model_88
111 0 2 8
222 9 10 17
234 0 0 6
I don't think I am searching online for the correct terms, I thought initially a transform might work but I still want the row to represent the ID not the model.
How do I go about creating this output from what I have?
Ok I believe this is it! Thank you #mjsqu !!
I was able to do this with the help of this link: http://www.sascommunity.org/mwiki/images/d/dd/PROC_Transpose_slides.pdf
data test_transpose ;
input #1 ID_P #6 Model_Value #18 Count_Model ;
cards;
111 24 2
222 12 9
234 88 6
111 88 8
222 24 10
222 88 17
run;
proc print data=test_transpose;
run;
proc sort data=test_transpose out=test_transpose_S;
By ID_P;
run;
proc transpose
data = test_transpose_S
out = test_transpose_result (drop=_name_)
prefix=Model_Value;
var Count_Model;
BY ID_P;
id Model_Value;
run;
proc print data=test_transpose_result ;
run;
Output of the original sorted dataset and the transpose!
I am having trouble with how to compare two data sets in SAS, but one data set might have extra observations. I want to get rid of these extra observations and just compare the rest of the two data sets as they are. Let me give an example:
Data Set 1
ID Value1 Value2
105 1 A
105 2 B
105 3 C
*105 4 D
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
Data Set 2
ID Value1 Value2
105 1 A
105 2 B
105 3 C
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
Both data sets are equal except for the observation with ID=105, Value1=4 (marked with an asterisk for visual convenience) that is in Data Set 1, but not in Data Set 2.
I need to compare both data sets with these types of observations gone from my first data set and check if those observations are equal for ID and Value1. And yes, the ID value is repeated for some observations. They are not duplicates though as they have different "Value1" values associated with them.
Is there an easy way to do this?
data a1;
input ID value1 value2$;
datalines;
105 1 A
105 2 B
105 3 C
105 4 D
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
run;
data b1;
input ID value1 value2$;
datalines;
105 1 A
105 2 B
105 3 C
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
run;
data a2(rename=(value1=value1_a value2=value2_a));
set a1;
newID=compress(ID||value1);
run;
data b2(rename= ( value1=value1_b value2=value2_b));
set b1;
newID=compress(ID||value1);
run;
proc sort data=a2;
by newID;
run;
proc sort data=b2;
by newid;
run;
data c1;
merge a2(in=a) b2(in=b);
by newID;
from_a=a;
from_b=b;
run;
/**check out unmatched data records**/
data unmatched;;
set c1;
where from_a^=1 or from_b^=1;
run;
proc print data=unmatched;
run;
Results:
Here is for matched records:
data matched;;
set c1;
where from_a=1 and from_b=1;
run;
proc print data=matched;
run;
Results:
Use PROC COMPARE with BY or ID
proc sort data=data1;
by id value1 value2;
run;
proc sort data=data2;
by id value1 value2;
run;
proc compare base=data1 compare=data;
id id value1;
run;
This is documented under Comparing datasets with an ID variable:
http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n14cxqy1h9hof4n1cq4xmhv2atgs.htm
data have;
input ID Herpes;
datalines;
111 1
111 .
111 1
111 1
111 1
111 .
111 .
254 0
254 0
254 1
254 .
254 1
331 1
331 1
331 1
331 0
331 1
331 1
;
Where 1=Positive, 0=Negative, .=Missing/Not Indicated
Observations are sorted by ID (random numbers, no meaning) and date of visit (not included because not needed from here forward). Once you have Herpes, you always have Herpes. How do I adjust the Herpes variable (or create a new one) so that once a Positive is indicated (Herpes=1), all following obs will show Herpes=1 for that ID?
I want the resulting set to look like this:
111 1
111 1 (missing changed to 1)
111 1
111 1
111 1 (missing changed to 1)
111 1 (missing changed to 1)
111 1
254 0
254 0
254 1
254 1 (missing changed to 1 following positive at prior visit)
254 1
331 1
331 1
331 1
331 1 (patient-indicated negative/0 changed to 1 because of prior + visit)
331 1
331 1
The below code should do the trick. The trick is to use by-group processing in conjunction with the retain statement.
proc sort data=have;
by id;
run;
data want;
set have;
by id;
retain uh_oh .;
if first.id then do;
uh_oh = .;
end;
if herpes then do;
uh_oh = 1;
end;
if uh_oh then do;
herpes = 1;
end;
drop uh_oh;
run;
You could create a new variable that sums the herpes flag within ID:-
proc sort data=have;
by id;
data have_too;
set have;
by id;
if first.id then sum_herpes_in_id = 0;
sum_herpes_in_id ++ herpes;
run;
That way it's always positive from the first time herpes=1 within id. You can access these observations in other datasteps / procs with where sum_herpes_in_id;.
And for free, you also have the total number of herpes flags per id (if that's of any use).
This can also be done in SQL. Here is an example using UPDATE to update the table in place. (This could also be done in base SAS with MODIFY.)
proc sql undopolicy=none;
update have H
set herpes=1 where exists (
select 1 from have V
where h.id=v.id
and h.dtvar ge v.dtvar
and v.herpes=1
);
quit;
The SAS version using modify. BY doesn't work in a one-dataset modify for some reason, so you have to do your own version of first.id.
data have;
modify have;
drop _:;
retain _t _i;
if _i ne id then _t=.;
_i=id;
_t = _t or herpes;
if _t then herpes=1;
run;