I wrote below part to get the third month from date macro.
%let date=2017-01-01;
%let a_SASdate=%sysfunc(inputn(&date.,yymmdd10.)) ;
%let b=%sysfunc(putn(&a_SASdate.,yymmn6.)) ;
%let et=%sysfunc(intnx(month,%sysfunc(inputn(&date.,yymmdd10.)),2,s),yymmn6.);
%put &a_SASdate. &b. &et.;
I wrote below code to create macro variable for each date.
data new;
do i=1 to 12;
call symput('mon'||put(i,z2.),put(intnx('month',&et.,i),yymmn6.));
a=symget('mon'||put(i,z2.));
output;
end;
run;
Expected output
i a
1 201704
2 201705
3 201706
4 201707
5 201708
6 201709
7 201710
8 201711
9 201712
10 201801
11 201801
12 201803
But what iam getting is
1 251204
2 251205
3 251206
4 251207
5 251208
6 251209
7 251210
8 251211
9 251212
10 251301
11 251302
12 251303
What went wrong?
When &et was resolved to 201703, which is not SAS date, so based on wrong date, intnx('month',&et.,i) gave you wrong results. So you have to convert &et to SAS date first. In addition, you just want define a variable, you don't need multiple macro variable.
data new;
do i=1 to 12;
call symput('a',put(intnx('month',input("&et",yymmn6.),i),yymmn6.));
a=symget('a');
output;
end;
run;
So you want to have year and month for next 12 months from beginning of certain date. I came up with a bit more compact solution:
%let date= '1jan17'd; /*Begin date*/
data wanted;
do i=1 to 12;
a=intnx('month',&date.,i); /*increment by single month from begin date*/
a=put(a, yymmn6.); /*This formats the date to wanted. */
output;
end;
run;
For more on relevant functions on IntNx and YYMMxw. Format
Related
I am matching files base on IDs numbers. I need to format a data set with the IDs to be matched, so that the same ID number is not repeated in column a (because column b's ID is the surviving ID after the match is completed). My list of IDs has over 1 million observations, and the same ID may be repeated multiple times in either/both columns.
Here is an example of what I've got/need:
Sample Data
ID1 ID2
1 2
3 4
2 5
6 1
1 7
5 8
The surviving IDs would be:
2
4
5
error - 1 no longer exists
error - 1 no longer exists
8
WHAT I NEED
ID1 ID2
1 2
3 4
2 5
6 5
5 7
7 8
I am, probably very obviously, a SAS novice, but here is what I have tried, re-running over and over again because I have some IDs that are repeated upward of 50 times or more.
Proc sort data=Have;
by ID1;
run;
This sort makes the repeated ID1 values consecutive, so the I could use LAG to replace the destroyed ID1s with the surviving ID2 from the line above.
Data Want;
set Have;
by ID1;
lagID1=LAG(ID1);
lagID2=LAG(ID2);
If NOT first. ID1 THEN DO;
If ID1=lagID1 THEN ID1=lagID2;
KEEP ID1 ID2;
IF ID1=ID2 then delete;
end;
run;
That sort of works, but I still end up with some that end up with duplicates that won't resolve no matter how many times I run (I would have looped it, but I don't know how), because they are just switching back and forth between IDs that have other duplicates (I can get down to about 2,000 of these).
I have figured out that instead of using LAG, I need replace all values after the current line with ID2 for each ID1 value, but I cannot figure out how to do that.
I want to read observation 1, find all later instances of the value of ID1, in both ID1 or ID2 columns, and replace that value with the current observation's ID2 value. Then I want to repeat that process with line 2 and so on.
For the example, I would want to look for any instances after line one of the value 1, and replace it with 2, since that is the surviving ID of that pair - 1 may appear further down multiple times in either of the columns, and I need all them to replaced. Line two would look for later values of 3 and replace them with 4, and so one. The end result should be that an ID number only appears once ever in the ID1 column (though it may appear multiple times in the ID2 column).
ID1 ID2
1 2
3 4
2 5
6 1
1 7
5 8
After first line has been read, data set would look as follows:
ID1 ID2
1 2
3 4
2 5
6 2
2 7
5 8
Reading observation two would make no changes since 3 does not appear again; after observation 3, the set would be:
ID1 ID2
1 2
3 4
2 5
6 5
5 7
5 8
Again, there would be not changes from observation four. but observation 5 would cause the final change:
ID1 ID2
1 2
3 4
2 5
6 5
5 7
7 8
I have tried using the following statement but I can't even tell if I am on the complete wrong track or if I just can't get the syntax figured out.
Data want;
Set have;
Do i=_n_;
ID=ID2;
Replace next var{EUID} where (EUID1=EUID1 AND EUID2=EUID1);
End;
Run;
Thanks for your help!
There is no need to work back and forth thru the data file. You just need to retain the replacement information so that you can process the file in a single pass.
One way to do that is to make a temporary array using the values of the ID variables as the index. That is easy to do for your simple example with small ID values.
So for example if all of the ID values are integers between 1 and 1000 then this step will do the job.
data want ;
set have ;
array xx (1000) _temporary_;
do while (not missing(xx(id1))); id1=xx(id1); end;
do while (not missing(xx(id2))); id2=xx(id2); end;
output;
xx(id1)=id2;
run;
You probably need to add a test to prevent cycles (1 -> 2 -> 1).
For a more general solution you should replace the array with a hash object instead. So something like this:
data want ;
if _n_=1 then do;
declare hash h();
h.definekey('old');
h.definedata('new');
h.definedone();
call missing(new,old);
end;
set have ;
do while (not h.find(key:id1)); id1=new; end;
do while (not h.find(key:id2)); id2=new; end;
output;
h.add(key: id1,data: id2);
drop old new;
run;
Here's an implementation of the algorithm you've suggested, using a modify statement to load and rewrite each row one at a time. It works with your trivial example but with messier data you might get duplicate values in ID1.
data have;
input ID1 ID2 ;
datalines;
1 2
3 4
2 5
6 1
1 7
5 8
;
run;
title "Before making replacements";
proc print data = have;
run;
/*Optional - should improve performance at cost of increased memory usage*/
sasfile have load;
data have;
do i = 1 to nobs;
do j = i to nobs;
modify have point = j nobs = nobs;
/* Make copies of target and replacement value for this pass */
if j = i then do;
id1_ = id1;
id2_ = id2;
end;
else do;
flag = 0; /* Keep track of whether we made a change */
if id1 = id1_ then do;
id1 = id2_;
flag = 1;
end;
if id2 = id1_ then do;
id2 = id2_;
flag = 1;
end;
if flag then replace; /* Only rewrite the row if we made a change */
end;
end;
end;
stop;
run;
sasfile have close;
title "After making replacements";
proc print data = have;
run;
Please bear in mind that as this modifies the dataset in place, interrupting the data step while it is running could result in data loss. Make sure you have a backup first in case you need to roll your changes back.
Seems like this should do the trick and is fairly straight forward. Let me know if it is what you are looking for:
data have;
input id1 id2;
datalines;
1 2
3 4
2 5
6 1
1 7
5 8
;
run;
%macro test();
proc sql noprint;
select count(*) into: cnt
from have;
quit;
%do i = 1 %to &cnt;
proc sql noprint;
select id1,id2 into: id1, :id2
from have
where monotonic() = &i;quit;
data have;
set have;
if (_n_ > input("&i",8.))then do;
if (id1 = input("&id1",8.))then id1 = input("&id2",8.);
if (id2 = input("&id1",8.))then id2 = input("&id2",8.);
end;
run;
%end;
%mend test;
%test();
this might be a little faster:
data have2;
input id1 id2;
datalines;
1 2
3 4
2 5
6 1
1 7
5 8
;
run;
%macro test2();
proc sql noprint;
select count(*) into: cnt
from have2;
quit;
%do i = 1 %to &cnt;
proc sql noprint;
select id1,id2 into: id1, :id2
from have2
where monotonic() = &i;
update have2 set id1 = &id2
where monotonic() > &i
and id1 = &id1;
quit;
proc sql noprint;
update have2 set id2 = &id2
where monotonic() > &i
and id2 = &id1;
quit;
%end;
%mend test2;
%test2();
I have a dataset like this:
DATA tmp;
INPUT
identifier $
d0101 d0102 d0103 d0104 d0105 d0106
d0107 d0108 d0109 d0110 d0111 d0112
;
DATALINES;
a 1 2 3 4 5 6 7 8 9 10 11 12
b 4 5 7 4 5 6 7 6 9 10 3 12
c 5 2 3 5 5 4 7 8 3 1 1 2
;
RUN;
And I'm trying to create a dataset like this:
DATA tmp;
INPUT
identifier $ day value
;
DATALINES;
a '01JAN2018'd 1
a '02JAN2018'd 2
a '03JAN2018'd 3
a '04JAN2018'd 4
a '05JAN2018'd 5
a '06JAN2018'd 6
a '07JAN2018'd 7
a '08JAN2018'd 8
a '09JAN2018'd 9
a '10JAN2018'd 10
a '11JAN2018'd 11
a '12JAN2018'd 12
b '01JAN2018'd 4
b '02JAN2018'd 5
b '03JAN2018'd 7
...
;
RUN;
I know the syntax for "melting" a dataset like this - I have completed a similar macro for columns that represent a particular value in each of the twelve months in a year.
What I'm struggling with is how to iterate through all days year-to-date (the assumption is that the have dataset has all days YTD as columns).
I'm used to Python, so something I might do there would be:
>>> import datetime
>>>
>>> def dates_ytd():
... end_date = datetime.date.today()
... start_date = datetime.date(end_date.year, 1, 1)
... diff = (end_date - start_date).days
... for x in range(0, diff + 1):
... yield end_date - datetime.timedelta(days=x)
...
>>> def create_date_column(dt):
... day, month = dt.day, dt.month
... day_fmt = '{}{}'.format('0' if day < 10 else '', day)
... month_fmt = '{}{}'.format('0' if month < 10 else '', month)
... return 'd{}{}'.format(month_fmt, day_fmt)
...
>>> result = [create_date_column(dt) for dt in dates_ytd()]
>>>
>>> result[:5]
['d1031', 'd1030', 'd1029', 'd1028', 'd1027']
>>> result[-5:]
['d0105', 'd0104', 'd0103', 'd0102', 'd0101']
Here is my SAS attempt:
%MACRO ITER_DATES_YTD();
DATA _NULL_;
%DO v_date = '01012018'd %TO TODAY();
%PUT d&v_date.;
* Will do "melting" logic here";
%END
%MEND ITER_DATES_YTD;
When I run this, using %ITER_DATES_YTD();, nothing is even printed to my log. What am I missing here? I basically want to iterate through "YTD" columns, like these d0101, d0102, d0103, ....
This is more a transposition problem than a macro / data step problem.
The core problem is that you have data in the metadata, meaning the 'date' is encoded in the column names.
Example 1:
Transpose the data, then use the d<yymm> _name_ values to compute an actual date.
proc transpose data=have out=have_t(rename=col1=value);
by id;
run;
data want (keep=id date value);
set have_t;
* convert the variable name has day-in-year metadata into some regular data;
date = input (cats(year(today()),substr(_name_,2)),yymmdd10.);
format date yymmdd10.;
run;
Example 2:
Do an array based transposition. The D<mm><dd> variables are being used in a role of value_at_date, and are easily arrayed due to a consistent naming convention. The VNAME function extricates the original variable name from the array reference and computes a date value from the <mm><dd> portion
data want;
set have;
array value_at_date d:;
do index = 1 to dim(value_at_date);
date = input(cats(year(today()),substr(VNAME(value_at_date(index)),2)), yymmdd10.);
value = value_at_date(index);
output;
end;
format date yymmdd10.;
keep id date value;
run;
To iterate through dates, you have to convert it to numbers first and then extract date part from it.
%macro iterateDates();
data _null_;
%do i = %sysFunc(inputN(01012018,ddmmyy8.)) %to %sysFunc(today()) %by 1;
%put d%sysFunc(putN(&i, ddmmyy4.));
%end;
run;
%mend iterateDates;
%iterateDates();
I think that '01012018'd is processed only in data step, but not in the macro code. And keep in mind, that macro code is executed first and only then the data step is executed. You can think about it like building SAS code with SAS macros and then running it.
hello am trying access columns from library with specific date format and using year function on the columns in my macro code but it produces duplicate values... but the year function displays duplicate values and does not provide desired results. my code should return only the year from the input dates.
%macro dteyear(lib=,outdsn=);
proc sql noprint;
select distinct catx(".",libname,memname), name
into :dsns separated by " ", :varname separated by " "
from dictionary.columns
where libname = upcase("&lib") and format=('YYMMDD10.')
order by 1;
quit;
%put &dsns;
%put &varname;
%local olddsn curdsn curvbl i;
data &outdsn.;
set
%let olddsn=;
%do i=1 %to &sqlobs;
%let curdsn=%scan(&dsns,&i,%str( ));
%let curvbl=%scan(&varname,&i,%str( ));
%if &curdsn NE &olddsn
%then %do;
%if &olddsn NE
%then %do;
)
%end;
%let olddsn=&curdsn.;
&curdsn (keep=&curvbl
%end;
%else %do;
&curvbl
%end;
%end;
);
%do i=1 %to &sqlobs;
%scan(&varname,&i,%str( ))=year(&varname.);
%end;
run;
proc print data=&outdsn;run;
%MEND;
%dteyear(lib=dte3,outdsn=dtetst);
the input data is as follows
1975-12-04
1977-11-03
1989-09-15
1998-06-17
1999-05-31
2000-08-14
2001-03-11
2007-03-11
2007-12-28
2008-10-07
2009-12-03
duplicate output from my code is-->
Obs RFDTC
1 1965-05-19
2 1965-05-19
3 1965-05-19
4 1965-05-19
5 1965-05-19
6 1965-05-19
7 1965-05-19
8 1965-05-19
9 1965-05-19
10 1965-05-19
11 1965-05-19
12 1965-05-19
13 1965-05-19
The basic problem is that the YEAR() function returns a 4-digit number, and the variable's format is YYMMDD10., so the result is formatted as a SAS date very close to 1960 (SAS's beginning of all time).
What I did in the code below was change the format to 4.0, so it displays as a 4-digit number.
If you want to have access to the original date variable, you'll have to create a new variable for the year. I'll leave that to you.
There was an additional problem--that is, YEAR(&varname.) inserts the entire list of variables, not just the one you're working with. It works if there is only one date variable, but not if there are more than one. I fixed this, too.
%macro dteyear(lib=,outdsn=);
proc sql noprint;
select distinct catx(".",libname,memname), name
into :dsns separated by " ", :varname separated by " "
from dictionary.columns
where libname = upcase("&lib") and format=('YYMMDD10.')
order by 1;
quit;
%put &dsns;
%put &varname;
%local olddsn curdsn curvbl i;
data &outdsn.;
set
%let olddsn=;
%do i=1 %to &sqlobs;
%let curdsn=%scan(&dsns,&i,%str( ));
%let curvbl=%scan(&varname,&i,%str( ));
%if &curdsn NE &olddsn
%then %do;
%if &olddsn NE
%then %do;
)
%end;
%let olddsn=&curdsn.;
&curdsn (keep=&curvbl
%end;
%else %do;
&curvbl
%end;
%end;
);
%do i=1 %to &sqlobs;
%let curvbl=%scan(&varname,&i,%str( ));
&curvbl=year(&curvbl.);
format &curvbl 4.0;
%end;
run;
proc print data=&outdsn;run;
%MEND;
data have;
input datevar yymmdd10.;
format datevar yymmdd10.;
cards;
1975-12-04
1977-11-03
1989-09-15
1998-06-17
1999-05-31
2000-08-14
2001-03-11
2007-03-11
2007-12-28
2008-10-07
2009-12-03
run;
options mprint;
%dteyear(lib=work,outdsn=want)
The result, then, is:
Obs datevar
1 1975
2 1977
3 1989
4 1998
5 1999
6 2000
7 2001
8 2007
9 2007
10 2008
11 2009
To convert a date value to just a year you can use the YEAR() function, but you also need to change the format attached to the variable since you will have essentially divided the value stored in it by 365 to convert it from the number of days to the number of years.
rfdtc = year(rfdtc);
format rfdtc 4. ;
Your macro is attempting to read many variables from many datasets and generate a single output dataset. I am not sure the resulting dataset will be of much value to you since it will look like a checker board of missing values. Also if the same variable name appears in more than one input dataset you will get corrupted values because of applying the YEAR() function to value that has already been converted from a date value to a year value.
For example you could end up generating a data step like this:
data WANT ;
set ds1 (keep=datevar1)
ds1 (keep=datevar2)
ds2 (keep=datevar3)
ds3 (keep=datevar3)
;
datevar1=year(datevar1);
datevar2=year(datevar2);
datevar3=year(datevar3);
datevar3=year(datevar3);
format datevar1 datevar2 datevar3 datevar3 4.;
run;
Since both input datasets DS2 and DS3 have a variable named DATEVAR3 you will be applying the YEAR() function to the value twice. That will convert everything to the year 1965.
To eliminate the problem with running the YEAR() function on the same value multiple times and losing the actual year perhaps you just want to apply the YEAR. format instead of converting the stored value.
format datevar1 datevar2 datevar3 datevar4 year. ;
That would still leave the underlying different date values. If you really need to values to be identical perhaps you could convert the value to the first day of the year? You could use INTNX() function
datevar1 = intnx('year',datevar1,0,'b');
or the MDY() function
datevar1 = mdy(1,1,year(datevar1));
I have the university edition of SAS.
I have data from treatment groups A, B, and C. I am trying to use DO loops to process the groups separately for comparison. I can do it in one nested DO loop when the data lengths are the same. But these groups have different numbers of observations and I am running into trouble. Here is my code:
data AirPoll1 (keep = Group Ozone);
label Group = "Treatment Group";
label Ozone = 'Ozone level (in ppb)';
do i=1 to 1;
input Group $##
do j=1 to 15;
input Ozone ##;
output;
end;
end;
do i=1 to 1;
input Group $ ##;
do j=1 to 10;
input Ozone ##;
output;
end;
end;
do i=1 to 1;
input Group $ ##;
do j=1 to 11;
input Ozone ##;
output;
end;
end;
datalines;
A 4 6 3 4 7 8 2 3 4 1 8 9 5 6 3
B 5 3 6 2 1 2 4 3 2 4
C 8 9 7 8 6 7 6 7 9 8 9
;
run;
proc univariate data = AirPoll1;
Var Ozone;
by Group;
histogram Ozone;
run;
The error I am getting is:
ERROR 161-185: No matching DO/SELECT statement.
Is there a quick way to fix this?
Quick fix indeed
you have missed off the semi-colon of the first input line,
doh:)
happy programming
My initial Dataset has 14000 STID variable with 10^5 observation for each.
I would like to make some procedures BY each stid, output the modification into data by STID and then set all STID together under each other into one big dataset WITHOUT a need to output all temporary STID-datsets.
I start writing a MACRO:
data HAVE;
input stid $ NumVar1 NumVar2;
datalines;
a 5 45
b 6 2
c 5 3
r 2 5
f 4 4
j 7 3
t 89 2
e 6 1
c 3 8
kl 1 6
h 2 3
f 5 41
vc 58 4
j 5 9
ude 7 3
fc 9 11
h 6 3
kl 3 65
b 1 4
g 4 4
;
run;
/* to save all distinct values of THE VARIABLE stid into macro variables
where &N_VAR - total number of distinct variable values */
proc sql;
select count(distinct stid)
into :N_VAR
from HAVE;
select distinct stid
into :stid1 - :stid%left(&N_VAR)
from HAVE;
quit;
%macro expand_by_stid;
/*STEP 1: create datasets by STID*/
%do i=1 %to &N_VAR.;
data stid&i;
set HAVE;
if stid="&&stid&i";
run;
/*STEP 2: from here data modifications for each STID-data (with procs and data steps, e.g.)*/
data modified_stid&i;
set stid&i;
NumVar1_trans=NumVar1**2;
NumVar2_trans=NumVar1*NumVar2;
run;
%end;
/*STEP 3: from here should be some code lines that set together all created datsets under one another and delete them afterwards*/
data total;
set %do n=1 %to &N_VAR.;
modified_stid&n;
%end;
run;
proc datasets library=usclim;
delete <ALL DATA SETS by SPID>;
run;
%mend expand_by_stid;
%expand_by_stid;
But the last step does not work. How can I do it?
You're very close - all you need to do is remove the semicolon in the macro loop and put it after the %end in step 3, as below:
data total;
set
%do n=1 %to &N_VAR.;
modified_stid&n
%end;;
run;
This then produces the statement you were after:
set modified_stid1 modified_stid2 .... ;
instead of what your macro was originally generating:
set modified_stid1; modified_stid2; ...;
Finally, you can delete all the temporary datasets using stid: in the delete statement:
proc datasets library=usclim;
delete stid: ;
run;