I am trying to distinguish recorded subjects who have confirmed hypertension, unaware of their status, and do not have hypertension. My current code only displays bpstatus 1&2 but i missing bpstatus 3 to show up. HTN= (1=hypertension) (0=no hypertension)... HAE(1=Aware of Hypertension) (2=unaware of Hypertension)
data new
set mergedata;
HTN=.;
If SBP >= 140 or DBP >= 90 then HTN = 1;
else if 0 < SBP < 140 and 0 < DBP < 90 then HTN = 0;
run;
proc print data=new;
run;
data new2;
set new;
BPSTATUS=.;
*3-level variable BPSTASTUS;
*diagnosed first;
if HTN=1 or HAE2=1 then BPSTATUS=1;
*undiagnosed;
if HTN=1 and HAE2=2 then BPSTATUS=2;
*normal;
if HTN=2 and HAE2=2 then BPSTATUS=3;
run;
proc print data=new2;
run;
proc freq data=new2;
table bpstatus;
run;
---------
BPSTATUS
Freq % Cum Freq Cum %
1 2354 67.76 2354 67.76
2 1120 32.24 3474 100.00
Frequency Missing = 4424
In this code:
data new
set mergedata;
HTN=.;
If SBP >= 140 or DBP >= 90 then HTN = 1;
else if 0 < SBP < 140 and 0 < DBP < 90 then HTN = 0;
run;
You initialize HTN to be missing, then set it to either 1 or 0 based on some conditions. In subsequent code, you check for this:
if HTN=2 and HAE2=2 then BPSTATUS=3;
Based on the logic in your first program, HTN will never be 2, which means BPSTATUS will never be 3.
Related
Every Subject has a baseline. Once the difference between the value and the baseline exceeds 5, that value becomes the baseline for all future comparisons until another value exceeds this new baseline by 5.
This is what I want the output data to look like:
This is what I'm getting
This is my current code, which gets me as close as anything I've tried. I've tried different combinations of retain, lag(), and ifn (suggested in this post)
Data Have;
Input Visit usubjid Baseline Value;
datalines;
1 1 112.2 112.2
2 1 112.2 113.7
3 1 112.2 112
3 1 112.2 108
4 1 112.2 109
5 1 112.2 107
7 1 112.2 106
8 1 112.2 107
;
run;
proc sort;by usubjid;run;
data want;
Length chg $71;
retain chg;
set Have;
length prevchg $71;
by usubjid;
prevchg=chg;
if first.usubjid then do; prevchg=''; end;
baseline=ifn(prevchg in ('Increase >= 5mm New', "Decrease >= 5mm"),lag(value),lag(baseline));
diff = value-baseline;
if visit > 1 then do;
if diff > 5 then do; chg='Increase >= 5mm New'; order = 3; end;
else if diff < -5 then do; chg = 'Decrease >= 5mm'; order = 6; end;
else if -5 <= diff <= 5 then do;
if prevchg in('Increase >= 5mm New', 'Increase > 5mm Persistent') then do; chg ='Increase > 5mm Persistent'; order = 4; end;
else do; chg = 'No Change (change >= -5 and <= 5mm)'; order = 5; end;
end;
end;
run;
Right now the code will correctly update the baseline to the previous value for the next visit, but then goes right back to the original baseline. I'm confident this has something to do with the way Lag() and Retain work with if/then, but I cannot figure out the solution. here is an example of the issue:
You should be able to do this easily. The BASELINE variable CANNOT be on the input if you want to RETAIN its value.
data want ;
set have ;
by usubjid;
retain baseline;
if first.usubjid then baseline=value;
difference = baseline - value;
output;
if difference > 5 then baseline=value;
run;
How does PROC STDIZE METHOD = RANGE work?
I thought that it would work like this:
Score = (Observation - Min) / ( Max - Min)
However, the range is [1,100] and there is never a 0 i.e. when you would substract the min observation from itself on the numerator.
I've tried reading the SAS documentation and running some trials in an excel workbook
PROC STDIZE
DATA = SASHELP.BASEBALL
METHOD = RANGE
OUT = BASEBALL_STDIZE
;
VAR CRHITS;
RUN;
range [0,100] expected, range [1,100] found
Obs _TYPE_ crhit2
1 LOCATION 34
2 SCALE 4222
3 ADD 0
4 MULT 1
5 N 322
6 NObsRead 322
7 NObsUsed 322
8 NObsMiss 0
I am working on a dataset in SAS to get the next observation's score should be the current observation's value for the column Next_Row_score. If there is no next observation then the current observation's value for the column Next_Row_score should be 'null'per group(ID). For better illustration i have provided the sample below dataset :
ID Score
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
Resultant output should be like -
ID Salary Next_Row_Salary
10 1000 1500
10 1500 2000
10 2000 .
20 3000 4000
20 4000 .
30 2500 2500
Thank you in advance for your help.
data want(drop=_: flag);
merge have have(firstobs=2 rename=(ID=_ID Score=_Score));
if ID=_ID then do;
Next_Row_Salary=_Score;
flag+1;
end;
else if ID^=_ID and flag>=1 then do;
Next_Row_Salary=.;
flag=.;
end;
else Next_Row_Salary=score;
run;
Try this :
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
proc sql noprint;
select count(*) into :obsHave
from have;
quit;
data want2(rename=(id1=ID Score1=Salary) drop=ID id2 Score);
do i=1 to &obsHave;
set have point=i;
id1=ID;
Score1=Score;
j=i+1;
set have point=j;
id2=ID;
if id1=id2 then do;
Next_Row_Salary = Score;
end;
else Next_Row_Salary=".";
output;
end;
stop;
;
run;
There is a simpler (in my mind, at least) proc sql approach that doesn't involve loops:
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
/*count each observation's place in its ID group*/
data have2;
set have;
count + 1;
by id;
if first.id then count = 1;
run;
/*if there is only one ID in a group, keep original score, else lag by 1*/
proc sql;
create table want as select distinct
a.id, a.score,
case when max(a.count) = 1 then a.score else b.score end as score2
from have2 as a
left join have2 (where = (count > 1)) as b
on a.id = b.id and a.count = b.count - 1
group by a.id;
quit;
I need to calculate max (Measure) in the last 3 months for each ID and month, without using PROC SQL.I was wondering I could do this using the RETAIN statement, however I have no idea how to implement the condition of comparing the value of Measure in the current row and the preceding two.
I will also need to prepare the above for more than 3 months so any solution that do not require a separate step for each additional month would be absolutely appreciated!
Here is the data I have:
data have;
input month ID $ measure;
cards;
201501 A 0
201502 A 30
201503 A 60
201504 A 90
201505 A 0
201506 A 0
201501 B 0
201502 B 30
201503 B 0
201504 B 30
201505 B 60
;
Here the one I need:
data want;
input month ID $ measure max_measure_3m;
cards;
201501 A 0 0
201502 A 30 30
201503 A 60 60
201504 A 90 90
201505 A 0 90
201506 A 0 90
201501 B 0 0
201502 B 30 30
201503 B 0 30
201504 B 30 30
201505 B 60 60
;
And here both tables: the one I have on the left and the one I need on the right
You can do this with an array that's size to your moving window. I'm not sure what type of dynamic code you need in terms of windows. If you need the max for a 4 or 5 month on top of 3 month then I would recommend using PROC EXPAND instead of these methods. The documentation for PROC EXPAND has a good example of how to do this.
data want;
set have;
by id;
array _prev(0:2) _temporary_;
if first.id then
do;
call missing (of _prev(*));
count=0;
end;
count+1;
_prev(mod(count, 3))=measure;
max=max(of _prev(*));
drop count;
run;
proc expand data=test out=out method=none;
by id;
id month;
convert x = x_movave3 / transformout=(movave 3);
convert x = x_movave4 / transformout=(movave 4);
run;
Try this:
data want(drop=l1 l2 cnt tmp);
set have;
by id;
retain cnt max_measure_3m l1 l2;
if first.id then do;
max_measure_3m = 0;
cnt = 0;
l1 = .;
l2 = .;
end;
cnt = cnt + 1;
tmp = lag(measure);
if cnt > 1 then
l1 = tmp;
tmp = lag2(measure);
if cnt > 2 then
l2 = tmp;
if measure > l1 and measure > l2 then
max_measure_3m = measure;
run;
I have a PROC REPORT output and want to add an asterick based on the value of the cell being less than 1.96. I don't want colours, just an asterick after the number. Can this be done with a format, or do I need an 'IF/ELSE' clause in the COMPUTE block?
data have1;
input username $ betdate : datetime. stake winnings winner;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90 0
player1 04NOV2008:09:03:44 100 40 1
player2 07NOV2008:14:03:33 120 -120 0
player1 05NOV2008:09:00:00 50 15 1
player1 05NOV2008:09:05:00 30 5 1
player1 05NOV2008:09:00:05 20 10 1
player2 09NOV2008:10:05:10 10 -10 0
player2 09NOV2008:10:05:40 15 -15 0
player2 09NOV2008:10:05:45 15 -15 0
player2 09NOV2008:10:05:45 15 45 1
player2 15NOV2008:15:05:33 35 -35 0
player1 15NOV2008:15:05:33 35 15 1
player1 15NOV2008:15:05:33 35 15 1
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout1 groups=2;
var stake winner;
ranks stakeRank winnerRank;
run;
proc sql;
create table withCubedDeviations as
select *,
((stake - (select avg(stake) from ranksout1 where stakeRank = main.stakeRank and winnerRank = main.winnerRank))/(select std(stake) from ranksout1 where stakeRank = main.stakeRank and winnerRank = main.winnerRank)) **3 format=8.2 as cubeddeviations
from ranksout1 main;
quit;
PROC REPORT DATA=withCubedDeviations NOWINDOWS out=report;
COLUMN stakerank winnerrank, ( N stake=avg cubeddeviations skewness);
DEFINE stakerank / GROUP ORDER=INTERNAL '';
DEFINE winnerrank / ACROSS ORDER=INTERNAL '';
DEFINE cubeddeviations / analysis 'SumCD' noprint;
DEFINE N / 'Bettors';
DEFINE avg / analysis mean 'Avg' format=8.2;
DEFINE skewness / computed format=8.2 'Skewness';
COMPUTE skewness;
_C5_ = _C4_ * (_C2_ / ((_C2_ -1) * (_C2_ - 2)));
_C9_ = _C8_ * (_C6_ / ((_C6_ -1) * (_C6_ - 2)));
ENDCOMP;
RUN;
This is just an example, so this won't make statistical sense, but if the value for SKEWNESS is greater than 1 I need to put a single asterick, two asterix if it's greater than 5 and three asterix if the value is greater than ten. Also, if the asterix could be in superscript that would be even better.
I've been testing the following, but to no avail:
PROC FORMAT;
picture onestar . = " " low - high = "9.9999^{super *}";*^{super***};
picture twostar . = " " low - high = "9.9999^{super **}";*^{super***};
picture threestar . = " " low - high = "9.9999^{super ***}";*^{super***};
run;
PROC REPORT DATA=withCubedDeviations NOWINDOWS out=report;
COLUMN stakerank winnerrank, ( N stake=avg cubeddeviations);
DEFINE stakerank / GROUP ORDER=INTERNAL '';
DEFINE winnerrank / ACROSS ORDER=INTERNAL '';
DEFINE cubeddeviations / analysis 'SumCD' noprint;
DEFINE N / 'Bettors';
DEFINE avg / mean 'Avg' format=8.2;
compute avg;
if _C3_ > 1.96 then call define('_C3_','format','onestar.');
endcomp;
RUN;
Thanks for any help.
I think this will do what you need:
proc format;
picture skewaskf
-1 <-<0 = '00009.99' (mult=100 prefix='-')
0-<1 = '00009.99' (mult=100)
1-<5 = '00009.99*'(mult=100)
5-<10= '00009.99**'(mult=100)
10-high='00009.99***'(mult=100);
quit;
Extend for the negatives further.