I need to input dates in YYYYMMDD format and create macro variables from these dates to use in a WHERE clause. The FINAL dataset should select one record from Sales but 0 observations are returned.
data work.FiscalYear2019;
input #1 fiscalYear $4. #5 StartDate mmddyy8.;
retain diff;
if fiscalYear = '2019' then do;
tday = today();
diff = tday - StartDate;
call symputx('FYTD_days',diff);
call symputx('CY_StartDate', StartDate);
call symputx('CY_EndDate', put(today(),mmddyy8.));
end;
else if fiscalYear = '2018' then do;
PY_EndDate = StartDate + diff;
call symput('PY_EndDate', put(PY_EndDate,mmddyy8.));
call symput('PY_StartDate', put(StartDate,mmddyy8.));
end;
datalines;
201912312018
201801012018
;
data work.Sales;
input #1 fiscalYear $4. #5 orderDate mmddyy8.;
format orderDate mmddyy6.;
datalines;
201902042019
201801012018
;
data final (WHERE=(orderDate >= &PY_StartDate AND
orderDate <= &PY_EndDate));
set Sales;
run;
I expect the FINAL dataset to contain one record from the Sales dataset but FINAL has 0 observations.
To use your macro variables as date values you need to either generate the macro variables as the raw number of days values, like you did with CY_StartDate, or generate them using the DATE format and enclose them in quotes and append the letter D to make a date literal.
Like this:
call symputX('PY_StartDate', put(StartDate,date9.));
call symputX('PY_EndDate', PY_EndDate);
...
data final
set Sales;
WHERE orderDate >= "&PY_StartDate"d
AND orderDate <= &PY_EndDate
;
run;
Also your subject line mentions YYYYMMDD informat and it does not appear in your code. Are you interpreting your source data properly? Does 201801012018 represent an 8 digit date in YMD order plus a four digit year? Or a four digit year plus an 8 digit date in MDY order?
You are just not referring to your macro variables in the last data step with the proper syntax. Those &PY_StartDate and &PY_EndDate variables are just strings after macro code is compiled, and you need to refer to them as date constants. So this should fix the issue:
data final (WHERE=(orderDate >= "&PY_StartDate"d AND
orderDate <= "&PY_EndDate"d));
set Sales;
run;
In the future, I recommend including options mprint; at the start of your code. This option displays the text generated by macro execution in your log, which can help greatly with debugging macros.
Related
I am new to SAS and I am struggling struggling with my code. I would love some help. Am I thinking about this the right way? I have a huge table and I want to extract that data from certain dates. My two dates: 1969-12-01 and 1948-01-01 my sample code:
data null;
call symput ('timenow',put (time(),time.));
call symput ('datenow',put (date(),date9.));
run;
title "The current time is timenow and the date is datenow";
proc print data=sashelp.buy;
run;
So first learn about your dataset. So for example run PROC CONTENTS.
proc contents data=sashelp.buy; run;
Which will show you that there is variable named DATE that has date values (number of days since 1960).
So to reference a specific date use a date literal. That is a quoted string in the style that the DATE informat can read followed by the letter D. You can then use a WHERE statement to filter the data.
data want;
set sashelp.buy;
where date = '31dec1969'd ;
run;
Which will not find any observations since that date does not appear in that dataset.
If you want to select for multiple dates you could either add more conditions using OR.
where (date = '31dec1969'd) or (date = '01jan1948'd);
You can also use the IN operator:
where date in ('31dec1969'd '01jan1948'd);
Note that if your variable contains datetime values (number of seconds) then to pick a specific date you would either need to use a range of datetime literals:
where datetime between '31dec1969:00:00'dt and '31dec1969:11:59:59'dt);
Or convert the number of seconds into number of days and compare to the date literal.
where datepart(datetime) = '31dec1969'd ;
Welcome to StackOverflow Sportsguy3090.
Here I make a dataset called sample with some sample dates. That dataset has a variable called name and another variable called date. Internally, SAS stores dates as the number of days until or after January 1st 1970. That is rough to look at. So I use the format statement to have the dates appear as a 10 character string with month/day/year.
data sample;
name = "Abe "; date = "01Dec1969"d; output;
name = "Betty"; date = "01Jan1948"d; output;
name = "Carl"; date = "06Jun1960"d; output;
name = "Doug"; date = "06Dec1969"d; output;
name = "Ed"; date = "01Jan1947"d; output;
format date mmddyy10.;
run;
The code below subsets the data and puts the good records into a new dataset called keepers. It only keeps the records that are in the date range (including the limit dates).
data keepers;
set sample;
where date between "01jan1948"d and "01Dec1969"d;
run;
I hope that helps.... if not send up another flare.
I would like to get only date and time separately from 01JAN13:08:29:00
Format & Infomat available in Dataset is:
Date Num 8 DATETIME.(format) ANYDTDTM40(informat)
And If I run datepart() on 01JAN13:08:29:00 I get output as 19359 (I don't want it.)
The DATEPART function extracts the date value from a datetime value. The date value as you have seen is simply a number. A date format must be applied to a variable holding a date value. Base SAS variables have only two value types, character and numeric.
data want;
now_dtm = datetime();
now_dt = datepart(now_dtm);
now_dt_unformatted = now_dt;
format now_dtm datetime.;
format now_dt date9.; * <----- this is what you need, format stored in data set header information;
run;
proc print data=want;
run;
* you can change the format temporarily during a proc step;
proc print data=want;
format now_dt yymmdd10.; * <---- changes format for duration of proc step;
format now_dt_unformatted mmddyy10.;
run;
Actually 19,359 is exactly the value you want. You started with the number of seconds since 1960 and converted it to the number of days since 1960.
data x ;
dt = '01JAN13:08:29:00'dt ;
date = datepart(dt);
time = timepart(dt);
put (dt date time) (=);
run;
Results
dt=1672648140 date=19359 time=30540
You just need to attach a format to your new variable so that SAS will display the value in a format that humans will recognize. You could use a format like DATE9. to have it show 19,359 as 01JAN2013. Similarly you need to attach a format to the time part to make it print in format that human's will interpret as a time.
format date date9. time time8. ;
I have a big database. There's a contract start date there. The problem is that in some time ago, several values had been imported there as a datetime format while the rest are just date9. In result now some sql queries or data queries shows weird results due to difference in seeing the "numbers" stored behind the contract start date.
Like when I want to get max(contract_start_date) (via sql, for example) I will get *************** instead of normal results.
My question is how can I unify this format difference? What I would like in the end is to make a new variable with unified format and then replace the existing contract start date with new one.
%let d_breakpoint=%sysfunc(putn('31dec2015'D, 13. -L));
%put &d_breakpoint;
%put %sysfunc(putn(&d_breakpoint, DATETIME. -L));
data indata;
format contract_start_date date9.;
do i=0 to 40;
contract_start_date = i*5000;
output;
end;
drop i;
run;
proc sql;
alter table indata add d_contract_start num format=date9.
;
update indata
set d_contract_start= case when contract_start_date > &d_breakpoint then contract_start_date/(24*60*60)
else contract_start_date end
;
quit;
proc sql;
select
min(d_contract_start) format=date9. as min
, max(d_contract_start) format=date9. as max
from indata
;
quit;
The variable has only one format, but one part of VALUES of that variable stored in table is not corresponding to that format - if the format is for DATE values (date as a number of days since 1jan1960) but some records store DATETIME values (number of seconds since midnight 1jan1960), the results are incorrect.
So you need to modify values to be of just one type - DATE or DATETIME.
The code above will change it to DATE values.
The idea is to define a breakpoint value - values above that will be treated as DATETIME values, the rest will be considered DATE values and will be kept like that.
In my example I've choosen DATE value of 31dec2015 (which is 20453) to be the breakpoint. So this represents 31dec2015 as DATE, while 01JAN60:05:40:53 as DATETIME.
Values below 20453 are considered DATE values, values above 20453 considered DATETIME values.
I have a question regarding moving average. I use Proc Expand (cmovave 3), but those three days can be non consecutive I suppose. I want to avoid missing data between days and use moving average for just those adjacent days.
Is there any way that I can do this? If I want to put it in another way 'how can I select a part of my data set where I have values for consecutive period (days)?'. I hope you give me some examples for this problem.
Use Expand to make sure you have all the values in the timeseries interval. Then use a data step to calculate the ma3 with the lagN() functions.
If you data already has the correct timeseries interval, then skip the PROC EXPAND step.
data test;
start = "01JAN2013"d;
format date date9.
value best.;
do i=1 to 365;
r = ranuni(1);
value = rannor(1);
date = intnx('weekday',start,i);
dummy=1;
if r > .33 then output;
end;
drop i start r;
run;
proc expand data=test out=test2 to=weekday ;
id date;
var dummy;
run;
data test(drop=dummy);
merge test2 test;
by date;
ma3 = (value + lag(value) + lag2(value))/3;
run;
I use the DUMMY variable so that EXPAND will convert the series to WEEKDAY. Then drop it afterwards.
I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.