I am trying to find days matching to a reference number of days given or else to find the number of days close to the reference days.
I coded till here, however not sure how to go forward.
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 560 86 181
3 2016-02-12 560 82 263
3 2016-02-12 560 69 332
3 2016-02-12 560 77 409
So now I want to bring out the last value close to the reference days.
and the next total_days should start from ZERO again to find the next window. How can I do this?
Here is a code that I wrote
data want;
do until (totaldays <= ref_days);
set have;
by ID ref_days notsorted;
if first.id then totaldays=0;
else totaldays+lags;
end;
run;
Required Output:
ID Date ref_days lags total_days
1 2017-02-02 224 . 0
1 2017-02-02 224 84 84
1 2017-02-02 224 84 168
2 2015-01-21 213 300 388
3 2016-02-12 560 95 .
3 2016-02-12 300 86 181
3 2016-02-12 300 82 263
3 2016-02-12 300 69 .
3 2016-02-12 300 77 146
A while ago I did similar to this via Proc sql. It calculates all the distances and takes the closest one. It works with moderate size dataset. Hopefully it is of some use.
proc sql;
select * from
(
select *,
abs(t1.link-t2.link) as dist /*In your case these would be dateVars*/
from test1 t1
left join test2 t2
on 1=1) group by system1 having dist=min(dist);
;
quit;
There was some talk that the left join on 1=1 is a bit silly (as full outter join would suffice, or something.) However this worked for the problem in question.
Related
I am currently trying to create a side-by-side dual axis bar chart in proc sgplot for the data which is based on dates. I am currently stuck at last thing, where I am not able to shift the bars using discreteoffset option on vbar, because I am using Type=time on xaxis. If I comment this, then the bars are shifted but then the xaxis tick values look clumsy. So I wonder if there is any other option that can move the bars for Date/time Data? Following is my SAS code.
data input;
input people visits outcome date date9.;
datalines;
41 448 210 1-Jan-18
43 499 207 1-Feb-18
45 544 221 1-Mar-18
49 564 239 1-Apr-18
39 575 236 1-May-18
37 549 210 1-Jun-18
51 602 263 1-Jul-18
32 586 208 1-Aug-18
52 557 225 1-Sep-18
41 534 227 1-Oct-18
48 499 217 1-Nov-18
44 514 235 1-Dec-18
31 582 281 1-Jan-19
33 545 269 1-Feb-19
38 574 259 1-Mar-19
29 564 247 1-Apr-19
29 642 274 1-May-19
28 556 216 1-Jun-19
20 531 187 1-Jul-19
31 604 226 1-Aug-19
19 513 186 1-Sep-19
24 483 185 1-Oct-19
28 401 156 1-Nov-19
18 450 158 1-Dec-19
21 418 178 1-Jan-20
28 396 149 1-Feb-20
43 488 177 1-Mar-20
33 539 205 1-Apr-20
57 631 244 1-May-20
54 695 291 1-Jun-20
58 732 309 1-Jul-20
62 681 301 1-Aug-20
42 654 291 1-Sep-20
57 749 365 1-Oct-20
60 627 249 1-Nov-20
56 623 244 1-Dec-20
54 712 298 1-Jan-21
62 655 262 1-Feb-21
;
run;
proc sgplot data=input;
format date monyy7.;
styleattrs datacolors=(Red DarkBlue) datacontrastcolors=(black black) datalinepatterns=(solid);
vbar date / response=visits discreteoffset=-0.17 barwidth=0.3;
vbar date / response=outcome discreteoffset=0.17 barwidth=0.3;
vline date / response=people y2axis lineattrs=(color=black thickness=3);
xaxis display=(nolabel) /*fitpolicy=rotate valuesrotate=vertical*/ type=time /*interval=month*/;
yaxis grid label='Label1' values=(0 to 800 by 100);
y2axis label='Label2' values=(0 to 70 by 10);
keylegend / title="";
run;
Output I am getting:
Output I want: (With shifted bars, but it is changing dates)
Appreciate any help!
Thank you.
Reshape the data with transpose so the variables wanted side by side become categorical, i.e. name value pairs. The name can be used in vbar as the group= with groupdisplay=cluster.
Note: The xaxis type=time appears to perform special checks based on the format of the vbar variable, and will rendered a pretty two-line axis label when that format is date9. I've never seen this discussed in the documentation.
Example:
Uses name= in the plotting statements so the keylegend can look prettier.
proc transpose data=input out=plot;
by rowid date;
copy people;
var visits outcome;
run;
proc sgplot data=plot;
vbar date / response=col1 group=_name_ groupdisplay=cluster name='relatedcounts';
vline date / response=people group=_name_ y2axis lineattrs=(color=black thickness=3) name='people';
xaxis
type = time
interval = month
;
format date date9.;
yaxis grid label='Related counts' values=(0 to 800 by 100);
y2axis label='# People' values=(0 to 70 by 10);
keylegend 'relatedcounts' / title="";
run;
Will produce
Hello,
I want to write a dynamic program which helps me to flag the start and end dates of events that are nested within the consolidated dates that are present at the top of each Pt.ID in the attached example. I can easily do these if there is only one such consolidated period per Pt.ID. However, there could be more than one such consolidated periods per Pt. ID. (As shown for second Pt.ID, 1002). As shown in the example, the events that fall within the consolidated period/s are fagged as "Y" in the flag variable and if they don't fall within the consolidated period then they are flagged as "N" in this variable. How can I write a program that accounts for all of such consolidated periods per Pt.ID and then compare them with the dates for the rest of the events of a particular patient and flag events which fall within any of those consolidated periods?
Thank you.
So join the event records with the period records and calculate whether the event is within the period. Then you could take the MAX over all periods.
For example here is code for your sample that creates a binary 1/0 flag variable called INCLUDED.
data Sample;
infile datalines missover;
input Pt_ID Event_ID Category $ Start_Date : mmddyy10.
Start_Day End_date : mmddyy10. End_day Duration
;
format Start_date End_date mmddyy10.;
datalines;
1001 . Moderate 8/5/2016 256 9/3/2016 285 30
1001 1 Moderate 3/8/2016 106 3/16/2016 114 9
1001 2 Moderate 8/5/2016 256 8/14/2016 265 10
1001 3 Moderate 8/21/2016 272 8/24/2016 275 4
1001 4 Moderate 8/23/2016 274 9/3/2016 285 12
1002 . Severe 11/28/2016 13 12/19/2016 34 22
1002 . Severe 2/6/2017 83 2/28/2017 105 23
1002 1 Severe 11/28/2016 13 12/5/2016 20 8
1002 2 Severe 12/12/2016 27 12/19/2016 34 8
1002 3 Severe 1/9/2017 55 1/12/2017 58 4
1002 4 Severe 2/6/2017 83 2/13/2017 90 8
1002 5 Severe 2/20/2017 97 2/28/2017 105 9
1002 6 Severe 3/17/2017 122 3/24/2017 129 8
1002 7 Severe 5/4/2017 170 5/13/2017 179 10
1002 8 Severe 5/24/2017 190 5/30/2017 196 7
1002 9 Severe 6/9/2017 206 6/13/2017 210 5
;
proc sql ;
create table want as
select a.*
, max(b.start_date <= a.start_date and b.end_date >= a.end_date ) as Included
from sample a
left join sample b
on a.pt_id = b.pt_id and missing(b.event_id)
group by 1,2,3,4,5,6,7,8
order by a.pt_id, a.event_id, a.start_date , a.end_date
;
quit;
DATA OZONE;
INPUT MONTH $ STMF YKRS ##;
CARDS;
A 80 66 A 68 82 A 24 47 A 24 28 A 82 44 A 100 55
A 55 34 A 91 60 A 87 70 A 64 41 A . 67 A . 127 A 170 96 A . 56
JN 215 93 JN 230 106 JN . 49 JN 69 64 JN 98 83 JN 125 97
JN 72 51 JN 125 75 JN 143 104 JN 192 107 JN . 56 JN 122 68
JN 32 20 JN 23 35 JN 71 30 JN 38 31 JN 136 81 JN 169 119
JL 152 76 JL 201 108 JL 134 85 JL 206 96 JL 92 48 JL 101 60
JL 133 . JL 83 50 JL . 27 JL 60 37 JL 124 47 JL 142 71
JL 75 49 JL 103 59 JL . 53 JL 46 25 JL 68 45 JL . 78
S 38 23 S 80 50 S 80 34 S 99 58 S 71 35 S 42 24 S 52 27 S 33 17
;
run;
Proc Ttest data=Ozone PLOT=NONE ALPHA=0.01;
Where MONTH='JN';
Paired STMF*YKRS;
Run;
Question 2
Data Baseball;
Input ba league;
Datalines;
276 National League
288 National League
281 National League
290 National League
303 National League
257 American League
254 American League
263 American League
261 American League
Run;
Proc Ttest data=Baseball ALPHA=0.02 ;
Question 3
Proc Ttest data=ozone ALPHA=0.01 Plot=NONE;
Where Month='A'-'S';
Paired STMF*YKRS;
Run;
Question 2 Test to see if both leagues have different batting averages. Use a alpha = 0.02 in your conclusions and compute a 98% confidence interval for the means.
Question 3 From the first question test to see the differences the A and S average ozone values. Use alpha = 0.01 in your conclusions. Include a 99% confidence interval for the difference.
So my questions are for Question 2 kind of a stupid question but for whatever reason I am confused as to what you are suppose to do.
For question 3 (my main question) how do I use one proc ttest to check and see the differences between months A and S? I tried using a Where statement as you can see above, but of course that does not work I’m abit stomped on where to go from here. Also I omited a good bit of the month data in the ozone portion as I couldn't properly format all of the data without it looking extremely confusing.
Thanks for your help in advance!
I have a dataframe with variables: id, 2001a, 2001b, 2002a, 2002b, 2003a, 2003b, etc.
I am trying to figure out a way to pivot the data so the variables are: id, year, a, b
The 16.2 documentation refers to some reshaping and pivoting, but that seemed to speak more towards hierarchical columns.
Any suggestions?
I am thinking about creating a hierarchical dataframe, but am not sure how to map the year in the original variable names to a created hierarchical column
sample df:
id 2001a 2001b 2002a 2002b 2003a etc.
1 242 235 5735 23 1521
2 124 168 135 1361 1
3 436 754 1 24 5124
etc.
Here is a way to create hierarchical columns.
df = pd.DataFrame({'2001a': [242,124,236],
'2001b':[242,124,236],
'2002a': [242,124,236],
'2002b': [242,124,236],
'2003a': [242,124,236]})
df.columns = df.columns.str.split('(\d+)', expand=True)
df
2001 2002 2003
a b a b a
0 242 242 242 242 242
1 124 124 124 124 124
2 236 236 236 236 236
Currently I have two datasets with similar variable lists. Each dataset has a procedure variable. I want to compare the frequency of the procedure variable between datasets. I created a flag in both datasets to id the source dataset, and was going to merge but don't have a common identifier. How do I merge a dataset without deleting any observations? This isn't just a simple Merge without a By function, right?
Currently have:
Data.a Data.b
pproc proc1_numb
70 9
71 15
77 24
80 80
81 42
83 71
86 66
87 125
121 159
125 242
Want Output:
pproc freq
9 1
15 1
24 1
42 1
66 1
70 1
71 2
77 1
80 2
81 1
83 1
86 1
87 1
121 1
125 2
159 1
242 1
If I understand your question properly, you should just concatenate the two datasets into one and rename the variable. Then you can use PROC MEANS to get the frequencies. Something like this:
data all;
set a
b(rename=(proc1_numb=pproc));
run;
proc means nway data=all noprint;
class pproc;
output out=want(drop=_type_ rename=(_freq_=freq));
run;