I have a time series data. Data looks like the following:
date variable
01-Dec-2012 0.1
02-Dec-2012 0.1
03-Dec-2012 0.1
04-Dec-2012 0.1
05-Dec-2012 0.1
...
20-Dec-2012 0.1
21-Dec-2012 0.1
22-Dec-2012 0.1
I want to create a dummy variable which equals to 1 if date is in December and it is before or at the second Thursday. It equals to 0 if date is in December and after the second Thursday. It equals to missing if month(date) ^= 12.
Can anyone teach me how to identify the second Thursday of December and solve this problem please.
NWKDOM
Third Friday in a month, where the month /year are extracted from a SAS date.
Friday3 = NWKDOM(3, 6, month(sas_date), year( sas_date));
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p1kdveu0ry8ltxn1m3um2ntxs7d5.htm
Here's another approach for people who don't have SAS 9.3+ and can't use nwkdom to do this:
Dummy = intck('week.5',intnx('month',date,0)-1,date-1) < 2;
How this works, from the inside working outwards:
intnx is used to find the first day of the month.
Subtract 1 to get the last day of the previous month.
Subtract 1 from date to get yesterday's date.
Using intck, count the number of Thursdays (week.5) in between these two dates. N.B. this includes yesterday if it was a Thursday, but not the last day of the previous month if that was a Thursday.
If this number is less than 2, date is currently less than or equal to the second Thursday of the month.
Sample usage:
data _null_;
do date = '01dec2011'd to '30dec2011'd;
Dummy = intck('week.5',intnx('month',date,0)-1,date-1) < 2;
put date weekdate. +1 dummy;
end;
run;
EDIT: now works correctly when the first day of the month is a Thursday.
Think this will solve your problem. Have a feeling there is a nicer solution for this but it should work.
data YourData;
format date date9. ;
do i=1 to 100 ;
date=intnx('day', '17oct03'd,i);
var=rand('uniform');
output;
end;
drop i;
run;
Data Find;
set YourData;
Month=month(date);
day=day(date);
Weekday=WEEKDAY(date);
/* weekday=5 this is thursday */
if weekday=5 and month=12 then flag=1;
/* flag2 retains the value */
flag2+flag;
if month=12 and flag2 < 2 then Dummy=1;
else if month=12 and flag2=2 and flag=1 then Dummy=1;
else if month=12 then Dummy=0;
else Dummy=.;
run;
Related
DATA proj4.gasQTR;
SET proj4.gasQTR;
INPUT Q1 Q2 Q3 Q4;
IF MONTH = 1 or 2 or 3 THEN Q1 = 1;
ELSE IF MONTH = 4 or 5 or 6 THEN Q2 = 2;
ELSE IF MONTH = 7 or 8 or 9 THEN Q3 = 3;
ELSE IF MONTH = 10 or 11 or 12 THEN Q4 = 4;
quarter = MONTH; FORMAT Quarter qtrw.;
RUN;
I am trying to get a 1-4 value for each qtr of each year, my error comes from Quarter qtrw. 'ERROR 388-185 Expecting an arithmetic operator'
*Data is already in 1-4 format for the month variable
What am I doing wrong?
Any help would be appreciated!
Thank you!
You normally do not use both a SET statement to retrieve data from an existing dataset and an INPUT statement to read values from a text file in the same data step. And if you do want to INPUT values from a text file you must tell SAS where to find the text by including either an INFILE statement or add the text in-line with the code by using a DATALINES (or CARDS) statement.
SAS will consider any number that is not zero or missing as TRUE. So the condition 2 or 3 or 4 is always TRUE. So Q1 will always be set to 1 and Q2, Q3 and Q4 will always be missing (or if they existed already unchanged). If you want to test if a variables has any of a number of values use the IN operator instead of the equality operator. month in (1 2 3 4)
You also should not be reading and writing the same dataset. If there are logic issues in your coding you might destroy the original dataset. So hopefully you have backup copy of proj4.gasQTR, or a program that can recreate it.
What is the format QTRW ? Is that something you created? Show its definition.
Assuming you have a variable named MONTH with integer values in the range 1 to 12 you can calculate QUARTER with integer values in the range 1 to 4 with a simple arithmetic function instead of coding a series of IF conditions.
data want;
set have;
quarter = ceil(month/3) ;
run;
If you actually have a DATE variable then perhaps all you were supposed to do was use the MONTH or QTR format to display the dates as the month number or quarter number that they fall into.
Try this program to see the impact of applying different formats to the same values.
data test;
do month=1 to 12;
date1=mdy(month,1,2022);
date2=date1;
date3=date1;
output;
end;
format date1 date9. date2 month. date3 qtr.;
run;
proc print;
run;
Use the in operator or repeat the equality for every case.
Example from the doc:
You can use the IN operator with character strings to determine whether a variable's value is among a list of character values. The following statements produce the same results:
if state in ('NY','NJ','PA') then region+1;
if state='NY' or state='NJ' or state='PA' then region+1;
Therefore
DATA proj4.gasQTR;
SET proj4.gasQTR;
IF MONTH = 1 or MONTH = 2 or MONTH = 3 THEN Q1 = 1;
ELSE IF MONTH = 4 or MONTH = 5 or MONTH = 6 THEN Q2 = 2;
ELSE IF MONTH = 7 or MONTH = 8 or MONTH = 9 THEN Q3 = 3;
ELSE IF MONTH = 10 or MONTH = 11 or MONTH = 12 THEN Q4 = 4;
quarter = MONTH; FORMAT Quarter qtrw.;
RUN;
is equivalent to
DATA proj4.gasQTR;
SET proj4.gasQTR;
IF MONTH in (1,2,3) THEN Q1 = 1;
ELSE IF MONTH in (4,5,6) THEN Q2 = 2;
ELSE IF MONTH in (7,8,9) THEN Q3 = 3;
ELSE IF MONTH in (10,11,12) THEN Q4 = 4;
quarter = MONTH; FORMAT Quarter qtrw.;
RUN;
We have a code that we use to create quarterly reports of projects. There is a piece of code, a do loop, that takes the startdate and enddate of each project in our dataset and creates an observation for each month and year that the project took place in. For example if we have a project called "Employment Help" with a startdate value of 01JAN2022 and an enddate value of 01APR2022, the do loop will create 4 observations for this project with the month and year values of 1 2022, 2 2022, 3 2022, and 4 2022. We use this to count how many projects happened during our quarters. We are running into an issue where the do loop is dropping projects and not giving them a month or year value and we are losing projects in our count because of this. The dates are all in the same format.
Here is an example of some data that is pulled in, EXAMPLE 2 is properly pulled into the do loop, EXAMPLE 1 does not get pulled through.
Here is the code:
**data test2;
set users3;
do i = 0 to (year(enddate)-year(startdate));
year = year(startdate)+i;
end;
do i = 0 to (month(enddate)-month(startdate));
month = month(startdate)+i;
drop i;
output;
end;
run;**
Consider the following example:
data have;
input project$ startdate:date9. enddate:date9.;
format startdate enddate date9.;
datalines;
A 01JAN2022 01APR2022
B 01MAR2022 01JUN2022
C 01NOV2022 01JAN2023
;
run;
The third row will fail to run because the difference between the start month number and end month number is negative (1 - 11). Instead of doing two loops, one for year and one for month, do a single loop for all of the months from the start date. Use intnx() to generate your months using startdate as the reference month. i will offset each month from the start date. For example:
code output
intnx('month', '01JAN2022'd, 0) 01JAN2022
intnx('month', '01JAN2022'd, 1) 01FEB2022
intnx('month', '01JAN2022'd, 2) 01MAR2022
Since you're incrementing by exactly one month for each date, you can get the year and month number in a single loop.
data want;
set have;
do i = 0 to intck('month', startdate, enddate);
month = month(intnx('month', startdate, i) );
year = year(intnx('month', startdate, i) );
output;
end;
drop i;
run;
Your code doesn't seem to handle the cross of years, ie if a project started in 2021 and ended in 2022.
This should get you closer.
data have;
input startdate : date9. enddate : date9.;
format startdate enddate date9.;
cards;
01Jan2022 01Apr2022
01Sep2021 01Apr2022
;;;
run;
data want;
set have;
nmonths = intck('month', startdate, enddate) +1 ;
date = startdate;
do i = 1 to nmonths;
month = month(date);
year = year(date);
date = intnx('month', startdate, i, 'b');
output;
end;
run;
Assume you have a data file called VIRUS_PROLIF from an infectious disease research center. Each observation has 3 variables COUNTRY START_DATE, and DOUBLE_RATE, where START_DATE is the date that the Country registered its 100th case of COVID-19. For each country, DOUBLE_RATE is the number of days it takes for the number of cases to double in that country. Write the SAS code using DO UNTIL to calculate the date at which that Country would be predicted to register 200,000 cases of COVID-19.
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate ;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
;
run;
data VIRUS_PROLIF1 (drop=start_date);
set VIRUS_PROLIF;
do until (num_of_cases>200000);
double_rate+1;
num_of_cases+ (num_of_cases*1);
end;
run;
proc print data=VIRUS_PROLIF1;
run;
The key concept you're missing here is how to employ the growth rate. That would be using the following formula, similar to interest growth for money.
If you have one dollar today and you get 100% interest it becomes
StartingAmount * (1 + interestRate) where the interest rate here is 100/100 = 1.
*fake data;
data VIRUS_PROLIF;
INPUT COUNTRY $ start_date mmddyy10. num_of_cases double_rate;
*here doubling rate is 100% so if day 1 had 100 cases day 2 will have 200;
Datalines;
US 03/13/2020 100 100
AB 03/17/2020 100 20
;
run;
data VIRUS_PROLIF1;
set VIRUS_PROLIF;
*assign date to starting date so both are in output;
date=start_date;
*save record to data set;
output;
do until (num_of_cases>200000);
*increment your day;
date=date+1;
;
*doubling rate is represented as a percent so add it to 1 to show the rate;
num_of_cases=num_of_cases*(1+double_rate/100);
*save record to data set;
output;
end;
*control date display;
format date start_date date9.;
run;
*check results;
proc print data=VIRUS_PROLIF1;
run;
The problem 200,000 < N0 (1+R/100) k can be solved for integer k without iterations
day_of_200K = ceil (
LOG ( 200000 / NUM_OF_CASES )
/ LOG ( 1 + R / 100 )
);
I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.
I have below dataset , I need to find the week number from the date given based on the financial year(e.g April 2013 to March 2014). For example 01AprXXX , should be 0th or 1st week of the year and the consequent next year March's last week should be 52/53. I have tried a way to find out the same( code is present below as well).
I am just curious to know if there is any better way in SAS to do this in SAS
. Thanks in advance. Please let me know if this question is redundant, in that case I would delete it at the earliest, although I search for the concept but didn't find anything.
Also my apologies for my English, it may not be grammatically correct.But I hope I am able to convey my point.
DATA
data dsn;
format date date9.;
input date date9.;
cards;
01Nov2015
08Sep2013
06Feb2011
09Mar2004
31Mar2009
01Apr2007
;
run;
CODE
data dsn2;
set dsn;
week_normal = week(date);
dat2 = input(compress("01Apr"||year(date)),date9.);
week_temp = week(dat2);
format dat2 date9.;
x1 = month(input(compress('01Jan'||(year(date)+1)),date9.)) ;***lower cutoff;
x2 = month(input(compress("31mar"||year(date)),date9.)); ***upper cutoff;
x3 = week(input(compress("31dec"||(year(date)-1)),date9.)); ***final week value for the previous year;
if month(dat2) <= month(date) <= month(input(compress("31dec"||year(date)),date9.)) then week_f = week(date) - week_temp;
else if x2 >= month(date) >= x1 then week_f = week_normal + x3 - week(input(compress("31mar"||(year(date)+1)),date9.)) ;
run;
RESULT
INTCK and INTNX are your best bets here. You could use them as I do below, or you could use the advanced functionality with defining your own interval type (fiscal year); that's described more in the documentation.
data dsn2;
set dsn;
week_normal = week(date);
dat2 = intnx('month12.4',date,0); *12 month periods, starting at month 4 (so 01APR), go to the start of the current one;
week_F = intck('week',dat2,date); *Can adjust what day this starts on by adding numbers to week, so 'week.1' etc. shifts the start of week day by one;
format dat2 date9.;
run;