I have a dataset like the following:
x y
16:00 1
17:00 2
18:00 2
19:00 3
20:00 4
21:00 5
22:00 6
23:00 1
24:00 1
01:00 2
02:00 3
03:00 1
04:00 7
...
I want to plot the relationship between x and y using the following code. I want my x axis start from 16:00 and end at 04:00. However using the code below, x axis start from 00:00 and end at 16:00. can anyone teach me how to adjust my code please. ( i dont want to type the order one by one like the following order = ("16:00" ..."04:00").
PROC SGPLOT DATA = data;
SERIES X = x Y = y;
axis order=("16:00:00"t to "03:00:00"t by hour);
TITLE 'Plot';
RUN;
So the problem is that numerical X axis values cannot be put out of order. And a time in SAS 1am < 11pm. So you cannot go around the clock, so to say.
A work around is to make the time values date times. That is, add a day component to it. Then you only display the time portion.
data have;
informat x time5. y best.;
format x time5.;
input x y;
datalines;
16:00 1
17:00 2
18:00 2
19:00 3
20:00 4
21:00 5
22:00 6
23:00 1
24:00 1
01:00 2
02:00 3
03:00 1
04:00 7
;
run;
data have;
retain day 0;
set have;
format x_new datetime.;
/*Count Days*/
if x = "24:00"t then
day = day + 1;
x_new = dhms(day,hour(x),minute(x),second(x));
run;
proc sgplot data=have;
series x=x_new y=y;
xaxis valuesformat=tod5.;
run;
Here I am looking for the 24 hour mark to increment the day count. Then creating a new variable to hold the day + the time.
When plotting, tell SAS to use the TODw.d format which only displays the time portion.
Here's what I get
Related
Please find the dataset below:
ID amt order_type no_of_order
1 200 em 6
1 300 on 5
2 600 em 10
Output desired:
ID amt order_type no_of_order
1 500 on 11
2 600 em 10
based on the highest amount i need to pick the order_type.
How can this be achieved in sas code
Sounds like you want to get the sum of the two numeric variables for each value of ID and also select one value for ORDER_TYPE. You appear to want to take the value of ORDER_TYPE which had the largest AMT. Here is simply way using PROC SUMMARY.
data have;
input ID amt order_type $ no_of_order;
cards;
1 200 em 6
1 300 on 5
2 600 em 10
;
proc summary data=have ;
by id;
var amt no_of_order;
output out=want sum= idgroup(max(amt) out[1] (order_type)=);
run;
Results:
no_of_ order_
Obs ID _TYPE_ _FREQ_ amt order type
1 1 0 2 500 11 on
2 2 0 1 600 10 em
I have an unbalanced panel dataset of the following form (simplified):
data have;
input ID YEAR EARN LAG_EARN;
datalines;
1 1960 450 .
1 1961 310 450
1 1962 529 310
2 1978 10 .
2 1979 15 10
2 1980 8 15
2 1981 10 8
2 1982 15 10
2 1983 8 15
2 1984 10 8
3 1972 1000 .
3 1973 1599 1000
3 1974 1599 1599
;
run;
I now want to estimate the following model for each ID:
proc reg;
by ID;
EARN = LAG_EARN;
run;
However, I want to do this for rolling windows of some size. Say for example for windows of size 2. The window should only contain non-empty observations. For example, in the case of firm A, the window is applicable from 1961 onwards and thus only one time (since only one year follows after 1961 and the window is supposed to be of size 2).
Finally, I want to get a table with year columns and firm rows. The table should indicate the following: The regression model (with window size 2) has been performed one time for firm A. The quantity of available years, has only allowed one estimation of this model. Put differently, in 1962 the coefficient of the regression model has a value of X based on the 2 year prior window. Applying the same logic to the other two firms, one can get the following table. "X" representing the respective estimated coefficient value in certain year for firm A/B/C based on the 2-year window and "n" indicating the non-existence of such a value:
data want;
input ID 1962 1974 1980 1981 1982 1983 1984;
datalines;
1 X n n n n n n
2 n n X X X X X
3 n X n n n n n
;
run;
I do not know how to execute this. Furthermore, I would like to create a macro that allows me to estimate different rolling window models while still creating analogous output dataframes. I would appreciate any help with it, since I have been struggling quite some time now.
Try this macro. This will only output if there are non-missing values of lags that you specify.
%macro lag(data=, out=, window=);
data _want_;
set &data.;
by ID;
LAG_EARN = lag&window.(earn);
if(first.ID) then call missing(lag_earn);
if(NOT missing(lag_earn));
run;
proc sort data=_want_;
by year id;
run;
proc transpose data=_want_
out=&out.(drop=_NAME_);
by ID notsorted;
id year;
var lag_earn;
run;
proc sort data=&out.;
by id;
run;
%mend;
%lag(data=have, out=want, window=1);
Assume you have a temporary SAS data set called EPISODES that contains information about hospital episodes. The data set contains the variables ID_NO (patient ID), ADMIT_DATE (date of admission), DISC_DATE (date of discharge), and TOTAL_COST.
Using this data set, create a new data set in which you will create a separate observation for each day of each hospital episode. If a patient had a hospital episode that was 3 days long, they would have three views in the new data set from that episode -- one for each day.
Each observation in the new data set should have only three variables: the patient identifier ID_NO, the date for that particular day of hospitalization XDATE, and the cost for that day of hospitalization DAILY_COST = TOTAL_COST divided by the number of days in the episode.
My thought is to do this as a loop. Something like the following.
data new_data;
set input_data ;
do xdate = admit_date to disc_data;
daily_cost = .... ;
output new_data ( keep = xdate daily_cost id_no );
end;
run;
*This program block sets up our data set;
data episodes;
INPUT ID_NO $ ADMIT_DATE mmddyy10. TOTAL_COST DISC_DATE mmddyy10.;
DATALINES;
1 01/01/2017 3000 01/03/2017
2 01/01/2017 14000 01/14/2017
;
run;
data new_episodes (keep= ID_NO XDATE DAILY_COST);
set episodes;
NUM_DAYS= DISC_DATE-ADMIT_DATE;
DAILY_COST= TOTAL_COST/(DISC_DATE-ADMIT_DATE);
*Using the Do While loop to create a matrix of date observations;
XDATE=ADMIT_DATE;*initializing our variable;
do while(XDATE<DISC_DATE);
put XDATE=;
XDATE+1;
output;*outputting the date variable;
end;
format XDATE mmddyy10.;
run;
proc print data=new_episodes;
run;
Since SAS stores dates as number of days you can just use a DO loop to increment XDATE from ADMIT_DATE to DISC_DATE.
But you need to decide how to count dates. If you are admitted on Monday and discharged on Tuesday is that one day or two days? If it is one day then do you want XDATE records for Monday or Tuesday? Or both?
Let's make some test data:
data have;
input id_no $ admit_date :yymmdd. total_cost disc_date :yymmdd.;
format admit_date disc_date yymmdd10.;
put (_all_) (+0);
datalines;
1 2017-01-01 3000 2017-01-04
2 2017-01-01 5000 2017-01-06
3 2020-02-23 500 2020-02-23
;
Here is code that treats Monday to Tuesday as one day. So it doesn't output the discharge date (unless it is the same as the admission date).
data want;
set have ;
if admit_date=disc_date then daily_cost=total_cost;
else daily_cost = total_cost / (disc_date - admit_date);
do xdate=admit_date to max(admit_date,disc_date-1) ;
output;
end;
keep id_no xdate daily_cost;
format xdate yymmdd10.;
run;
Results:
daily_
Obs id_no cost xdate
1 1 1000 2017-01-01
2 1 1000 2017-01-02
3 1 1000 2017-01-03
4 2 1000 2017-01-01
5 2 1000 2017-01-02
6 2 1000 2017-01-03
7 2 1000 2017-01-04
8 2 1000 2017-01-05
9 3 500 2020-02-23
If you want to treat a stay from Monday to Tuesday as 2 days then the code is easier.
data want;
set have ;
daily_cost = total_cost / (disc_date - admit_date + 1);
do xdate=admit_date to disc_date ;
output;
end;
keep id_no xdate daily_cost;
format xdate yymmdd10.;
run;
Results:
daily_
Obs id_no cost xdate
1 1 750.000 2017-01-01
2 1 750.000 2017-01-02
3 1 750.000 2017-01-03
4 1 750.000 2017-01-04
5 2 833.333 2017-01-01
6 2 833.333 2017-01-02
7 2 833.333 2017-01-03
8 2 833.333 2017-01-04
9 2 833.333 2017-01-05
10 2 833.333 2017-01-06
11 3 500.000 2020-02-23
This is my code:
DATA sales;
INFILE 'D:\Users\...\Desktop\Onions.dat';
INPUT VisitingTeam $ 1-20 ConcessionSales 21-24 BleacherSales 25-28
OurHits 29-31 TheirHits 32-34 OurRuns 35-37 TheirRuns 38-40;
PROC PRINT DATA = sales;
TITLE 'SAS Data Set Sales';
RUN;
This is the data, but the spacing may be incorrect.
Columbia Peaches 35 67 1 10 2 1
Plains Peanuts 210 . 2 5 0 2
Gilroy Garlics 151035 12 11 7 6
Sacramento Tomatoes 124 85 15 4 9 1
;
I need to add or delete a blank column at the 19th
column. Can someone help?
Just open the dataset and then look at what the variable name is. Then do:
Data Want (drop=varible_name_you_are_dropping); /*This is your output dataset*/
Set have; /*this is your dataset you have*/
Run;
I am trying to construct centered moving average in SAS.
my table is in below
date number average
01/01/2015 18 ...
01/01/2015 15 ...
01/01/2015 5 ...
02/01/2015 66 ...
02/01/2015 7 ...
03/01/2015 7 ...
04/01/2015 19 ...
04/01/2015 7 ...
04/01/2015 11 ...
04/01/2015 17 ...
05/01/2015 3 ...
06/01/2015 7 ...
... ... ...
I need to obtain the average number for a surrounding period over (-2,+2) days, instead of (-2,+2) observations
I know that for Centered Moving Average, I can use.
convert number=av_number/transformout=(cmovave 3)
but here we have different number of observations in each day.
Anyone can tell me how to include only (-2, +2) days of centered moving average in this case ?
Thanks in advance !
Best
The suggestion from #Joe to aggregate to a daily level is the right approach, however you have to be careful that you don't lose the number of entries per day, otherwise you won't calculate the correct moving average. In other words, you need to weight the daily value by the number of entries for that day.
I've taken 3 steps to calculate the moving average, it may be possible to do it in 2 but I can't see how.
Step 1 is to calculate the sum and count of number per day.
Step 2 is to calculate the moving 5 day sum for both variables.
Step 3 then divides the sum by the count to get the weighted 5 day average.
I've added the trim function to exclude the first and last 2 records, obviously you can include those if you wish. You'll probably want to drop some of the extra variables as well.
data have;
input date :ddmmyy10. number;
format date date9.;
datalines;
01/01/2015 18
01/01/2015 15
01/01/2015 5
02/01/2015 66
02/01/2015 7
03/01/2015 7
04/01/2015 19
04/01/2015 7
04/01/2015 11
04/01/2015 17
05/01/2015 3
06/01/2015 7
;
run;
proc summary data=have nway;
class date;
var number;
output out=daily_agg sum=;
run;
proc expand data=daily_agg out=daily_agg_mov_sum;
convert number=tot_number / transformout = (cmovsum 5 trim 2);
convert _freq_=tot_count / transformout = (cmovsum 5 trim 2);
run;
data want;
set daily_agg_mov_sum;
if not missing(tot_number) then av_number = tot_number / tot_count;
run;