I'd like to calculate the number of patients currently within an Emergency Room by hour and I'm having trouble conceptualizing an efficient code.
I have two time variables, 'Check In Time' and 'Release Time'. These date/time variables are obviously arbitrary and the 'release time' variable will come after the 'check in time variable'.
I would like the output for a given day to look something like this:
Hour Midnight 1am 2am 3am 4am.....
# of Pts 34 56 89 23 29
So for example, at 1am there were 56 patients currently in the ED -when considering both checkin and release times.
My initial thought is to:
1) round the time variables
2) Write a code a code the looks something like this...
data EDTimesl;
set EDDATA;
if checkin = '1am' and release = '2am' then OneAMToTwoAM = 1;
if checkin = '1am' and release = '3am' then OneAMToTwoAM = 1;
if checkin = '1am' and release = '3am' then TwoAMToThreeAM = 1;
....
run;
This, however, gives me pause because I feel there is a more efficient method!
Thanks in advance!
I found a code online that might answer the question! Please see below:
data have (keep=admitdate disdate);
/* generate some admission and discharge date time variables*/
year=2015; /* for example all of the admits are in 2015*/
format admitdate disdate datetime20.;
do day= 1 to 20;
do month=1 to 12;
hour = floor(24*ranuni(4445));
min = floor(50*ranuni(1234));
date = mdy(month,day,2015);
admitdate=dhms(date,hour,min,0);
/* random duration of stay*/
duration = 60 + floor(3000*ranuni(7777));
disdate = intnx('minute',admitdate,duration);
output;
end;
end;
run;
data occupancy;
set have;
format admitdate disdate datetime20.;
Do Occupanthour = (dhms(datepart(admitdate),hour(admitdate),0,0)) to
dhms(datepart(disdate),hour(disdate),0,0) by 3600;
HourOfDay = hour(OccupantHour);
DayOfWeek = Weekday(datepart(OccupantHour));
output;
end;
format OccupantHour datetime20.;
run;
Proc freq data=occupancy;
Tables HourOfDay;
run;
proc tabulate data=occupancy;
class DayOfWeek;
class HourOfDay;
tables HourOfDay,
(DayOfWeek All)*n;
run;
Related
Assume you have a data file called VIRUS_PROLIF from an infectious disease research center. Each observation has 3 variables COUNTRY START_DATE, and DOUBLE_RATE, where START_DATE is the date that the Country registered its 100th case of COVID-19. For each country, DOUBLE_RATE is the number of days it takes for the number of cases to double in that country. Write the SAS code using DO UNTIL to calculate the date at which that Country would be predicted to register 200,000 cases of COVID-19.
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate ;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
;
run;
data VIRUS_PROLIF1 (drop=start_date);
set VIRUS_PROLIF;
do until (num_of_cases>200000);
double_rate+1;
num_of_cases+ (num_of_cases*1);
end;
run;
proc print data=VIRUS_PROLIF1;
run;
The key concept you're missing here is how to employ the growth rate. That would be using the following formula, similar to interest growth for money.
If you have one dollar today and you get 100% interest it becomes
StartingAmount * (1 + interestRate) where the interest rate here is 100/100 = 1.
*fake data;
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
AB 03/17/2020 100 20
;
run;
data VIRUS_PROLIF1;
set VIRUS_PROLIF;
*assign date to starting date so both are in output;
date=start_date;
*save record to data set;
output;
do until (num_of_cases>200000);
*increment your day;
date=date+1;
;
*doubling rate is represented as a percent so add it to 1 to show the rate;
num_of_cases=num_of_cases*(1+double_rate/100);
*save record to data set;
output;
end;
*control date display;
format date start_date date9.;
run;
*check results;
proc print data=VIRUS_PROLIF1;
run;
The problem 200,000 < N0 (1+R/100) k can be solved for integer k without iterations
day_of_200K = ceil (
LOG ( 200000 / NUM_OF_CASES )
/ LOG ( 1 + R / 100 )
);
I'm trying to write SAS code that can loop over a dataset that contains event dates that looks like:
Data event;
input Date;
cards;
20200428
20200429
;
run;
And calculate averages for the prior three-days from another dataset that contains dates and volume that looks like:
Data vol;
input Date Volume;
cards;
20200430 100
20200429 110
20200428 86
20200427 95
20200426 80
20200425 90
;
run;
For example, for date 20200428 the average should be 88.33 [(95+80+90)/3] and for date 20200429 the average should be 87.00 [(86+95+80)/3]. I want these values and the volume of the date to be saved on a new dataset that looks like the following if possible.
Data clean;
input Date Vol Avg;
cards;
20200428 86 88.33
20200429 110 87.00
;
run;
The actual data that I'm working with is from 1970-2010. I may also increase my average period from 3 days prior to 10 days prior, so I want to have flexible code. From what I've read I think a macro and/or call symput might work very well for this, but I'm not sure how to code these to do what I want. Honestly, I don't know where to start. Can anyone point me in the right direction? I'm open to any advice/ideas. Thanks.
A SQL statement is by far the most succinct code for obtaining your result set.
The query will join with 2 independent references to volume data. The first for obtaining the event date volume, and the second for computing the average volume over the three prior days.
The date data should be read in as a SAS date, so that the BETWEEN condition will be correct.
Data event;
input Date: yymmdd8.;
cards;
20200428
20200429
;
run;
Data vol;
input Date: yymmdd8. Volume;
cards;
20200430 100
20200429 110
20200428 86
20200427 95
20200426 80
20200425 90
;
run;
* SQL query with GROUP BY ;
proc sql;
create table want as
select
event.date
, volume_one.volume
, mean(volume_two.volume) as avg
from event
left join vol as volume_one
on event.date = volume_one.date
left join vol as volume_two
on volume_two.date between event.date-1 and event.date-3
group by
event.date, volume_one.volume
;
* alternative query using correlated sub-query;
create table want_2 as
select
event.date
, volume
, ( select mean(volume) as avg from vol where vol.date between event.date-1 and event.date-3 )
as avg
from event
left join vol
on event.date = vol.date
;
For the case of the Volumes data being date gapped, a better solution would be to separately compute the rolling average of N prior volumes. The date gaps could be from weekends, holidays, or a date not present due to data entry problems or operator error. Conceptually, for the averaging, the only role of date is only to order the data.
After the rolling averages are computed, a simple join or merge can be done.
Example:
* Simulate some volume data that excludes weekends, holidays, and a 2% rate of missing dates;
data volumes(keep=date volume);
call streaminit(20200502);
do date = '01jan1970'd to today();
length holiday $25;
year = year(date);
holiday = 'NEWYEAR'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'USINDEPENDENCE'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'THANKSGIVING'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'CHRISTMAS'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'MEMORIAL'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'LABOR'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'EASTER'; hdate = holiday(holiday, year); if date=hdate then continue;
holiday = 'USPRESIDENTS'; hdate = holiday(holiday, year); if date=hdate then continue;
if weekday(date) in (1,7) then continue; *1=Sun, 7=Sat;
volume = 100 + ceil(75 * sin (date / 8));
if rand('uniform') < 0.02 then continue;
output;
end;
format date yymmdd10.;
run;
* Compute an N item rolling average from N prior values;
%let ROLLING_N = 5;
data volume_averages;
set volumes;
by date; * enforce sort order requirement;
array v[0:&ROLLING_N] _temporary_; %* <---- &ROLLING_N ;
retain index -1;
avg_prior_&ROLLING_N. = mean (of v(*)); %* <---- &ROLLING_N ;
OUTPUT;
index = mod(index + 1,&ROLLING_N); %* <---- Modular arithmetic, the foundation of rolling ;
v[index] = volume;
format v: 6.;
drop index;
run;
* merge;
data want_merge;
merge events(in=event_date) volume_averages;
by date;
if event_date;
run;
* join;
proc sql;
create table want_join as
select events.*, volume_averages.avg_prior_5
from events join volume_averages
on events.date = volume_averages.date;
quit;
You want to loop over a series of dates in an input data set. Therefore I use a PROC SQL statement where I select the distinct dates in this input data set into a macro variable.
This macro variable is then used to loop over. In your example the macro variable will thus be: 20200428 20200429. You can then use the %SCAN macro function to start looping over these dates.
For each date in the loop, we will then calculate the average: in your example the average of the 3 days prior to the looping date. As the number of days for which you want to calculate the average is variable, this is also passed as a parameter in the macro. I then use the INTNX function to calculate the lower bound of dates you want to select to calculate the average over. Then the PROC MEANS procedure is used to calculate the average volume over the days: lower bound - looping date.
I then put a minor data step in between to attach the looping date again to the calculated average. Finally everything is appended in a final data set.
%macro dayAverage(input = , range = , selectiondata = );
/* Input = input dataset
range = number of days prior to the selected date for which you want to calculate
the average
selectiondata = data where the volumes are in */
/* Create a macro variable with the dates for which you want to calculate the
average, to loop over */
proc sql noprint;
select distinct date into: datesrange separated by " "
from &input.;
quit;
/*Start looping over the dates for which you want to calculate the average */
%let I = 1;
%do %while (%scan(&datesrange.,&I.) ne %str());
/* Assign the current date in the loop to the variable currentdate */
%let currentdate = %scan(&datesrange.,&I.);
/* Create the minimum date in the range based on input parameter range */
%let mindate =
%sysfunc(putn(%sysfunc(intnx(day,%sysfunc(inputn(¤tdate.,yymmdd8.)),-
&range.)),yymmddn8.));
/* Calculate the mean volume for the selected date and selected range */
proc means data = &selectiondata.(where = (date >= &mindate. and date <
¤tdate.)) noprint ;
output out = averagecurrent(drop = _type_ _freq_) mean(volume)=avgerage_volume;
run;
/* Add the current date to the calculated average */
data averagecurrent;
retain date average_volume;
set averagecurrent;
date = ¤tdate.;
run;
/* Append the result to a final list */
proc datasets nolist;
append base = final data = averagecurrent force;
run;
%let I = %eval(&I. + 1);
%end;
%mend;
This macro can in your example be called as:
%dayAverage(input = event, range = 3, selectiondata = vol);
It will give you a data set in your work library called final
Here's my data.
Each record/row documents the position the patient is in and the document date and time. So, except the first record in a day, we can calculate the time the position has been kept since the last record. The goal is to flag the records that indicate the patient has been in one of the positions: 'right', 'back', 'left', for at least 2 hours in the same day. The red are rows that should be flagged. To do this, I think I need to create a column that has the time at which the last time a different position was documented.
You are computing the run duration for a position. As your step goes through the data you will need to track the position and start of a run. Tracking can be accomplished with a retained variable.
data want;
set have;
by patientid date time; * add date and time to by statement so an error will log if the data is not in the required order;
if first.patientid then do;
run_position = position;
run_start = dhms (date,0,0,0) + time;
retain run_position run_start;
end;
if position = run_position then do;
hours_duration = intck ('hour', run_start, dhms(date,0,0,0) + time, 'continuous');
end;
else do;
* new run start;
run_position = position;
run_start = dhms (date,0,0,0) + time;
hours_duration = 0;
end;
flag_ge2hr = hours_duration >= 2;
run;
Use the NOTSORTED option to get this done more easily. Assuming the data is sorted By Date and time correctly, this is what you likely need.
data want;
set have;
by ID position NOTSORTED;
retain start_time;
if first.position then start_time = observation_time;
duration = observation_time - start_time;
if duration > 2*60*60 then flag=1; *time is stored in seconds, so 2 hours * 60minutes * 60 seconds per hour;
run;
Untested because no data.
I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.
I have below dataset , I need to find the week number from the date given based on the financial year(e.g April 2013 to March 2014). For example 01AprXXX , should be 0th or 1st week of the year and the consequent next year March's last week should be 52/53. I have tried a way to find out the same( code is present below as well).
I am just curious to know if there is any better way in SAS to do this in SAS
. Thanks in advance. Please let me know if this question is redundant, in that case I would delete it at the earliest, although I search for the concept but didn't find anything.
Also my apologies for my English, it may not be grammatically correct.But I hope I am able to convey my point.
DATA
data dsn;
format date date9.;
input date date9.;
cards;
01Nov2015
08Sep2013
06Feb2011
09Mar2004
31Mar2009
01Apr2007
;
run;
CODE
data dsn2;
set dsn;
week_normal = week(date);
dat2 = input(compress("01Apr"||year(date)),date9.);
week_temp = week(dat2);
format dat2 date9.;
x1 = month(input(compress('01Jan'||(year(date)+1)),date9.)) ;***lower cutoff;
x2 = month(input(compress("31mar"||year(date)),date9.)); ***upper cutoff;
x3 = week(input(compress("31dec"||(year(date)-1)),date9.)); ***final week value for the previous year;
if month(dat2) <= month(date) <= month(input(compress("31dec"||year(date)),date9.)) then week_f = week(date) - week_temp;
else if x2 >= month(date) >= x1 then week_f = week_normal + x3 - week(input(compress("31mar"||(year(date)+1)),date9.)) ;
run;
RESULT
INTCK and INTNX are your best bets here. You could use them as I do below, or you could use the advanced functionality with defining your own interval type (fiscal year); that's described more in the documentation.
data dsn2;
set dsn;
week_normal = week(date);
dat2 = intnx('month12.4',date,0); *12 month periods, starting at month 4 (so 01APR), go to the start of the current one;
week_F = intck('week',dat2,date); *Can adjust what day this starts on by adding numbers to week, so 'week.1' etc. shifts the start of week day by one;
format dat2 date9.;
run;