I have a dataset with each observation having two space-separated lists as string variables. I want a third variable showing the overlap between the string lists. Using another SO post, I've created a macro to calculate the overlap. I can't work out how to implement it in a DATA step to get the third variable.
This is my dataset, with dummy data:
data use;
infile datalines dlm='~~';
input list1:$100. list2:$100. expected_match:$10.;
datalines;
Homer Bart~~Homer Bart~~Full
Marge Lisa~~Lisa Marge~~Full
Homer Marge~~Marge~~Partial
Bart Lisa~~Bart~~Partial
Homer Marge Bart Lisa~~Maggie~~None
;;;;
run;
This is the macro, with tests (all of which pass):
%macro list_overlap(list1, list2);
%local i matches match_type;
%let matches = 0;
%do i = 1 %to %sysfunc(countw(&list1, %str( )));
%if %sysfunc(findw(&list2, %scan(&list1, &i,, s)))
%then %let matches = %eval(&matches + 1);
%end;
%if &matches = %sysfunc(countw(&list1, %str( )))
and %sysfunc(countw(&list1, %str( ))) = %sysfunc(countw(&list2, %str( )))
%then %let match_type = 'Full';
%else %if &matches = 0 %then %let match_type = 'None';
%else %let match_type = 'Partial';
match_type = &match_type%str(;)
%mend list_overlap;
%put NOTE: %list_overlap(Homer Bart,Homer Bart);
%put NOTE: %list_overlap(Marge Lisa,Lisa Marge);
%put NOTE: %list_overlap(Homer Marge,Marge);
%put NOTE: %list_overlap(Bart Lisa,Bart);
%put NOTE: %list_overlap(Homer Marge Bart List,Maggie);
This is how I'm trying to implement it in a DATA step:
data matches;
set use;
call execute(catt('%list_overlap(', list1, ',', list2, ')'));
run;
I'm getting the following error with this case:
NOTE: Line generated by the CALL EXECUTE routine.
1 + match_type = 'Full';
__________
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
I've tried other ways too, but this is the closest I've got.
Looks like you want the RESOLVE() function instead of CALL EXECUTE.
data matches;
set use;
match_type = resolve(cats('%list_overlap(',list1,',',list2,')'));
run;
But with your current definition of the macro that will include all of characters the macro generates, match_type = 'Full';, into the value of the MATCH_TYPE variable. So remove the superfluous characters the macro is currently generating so that it only generates the value you want to save.
... %then Full;
%else %if matches eq 0 %then None;
%else Partial;
Your problem here is that call execute isn't doing what you think, I suspect.
What's happening:
Data step runs, call execute lines generated
Then the macro stuff is call executed, and you have:
Code example:
data matches;
set use;
call execute(stuff);
run;
match_type = 'Full';
That's not legal - that's a data step line but not in a data step.
Instead of doing all of that work in macro land, do it in data step land. Works just as well, and gets done what you want.
Something like this:
%macro list_overlap(list1,list2);
matches=0;
length match_type $7;
do i = 1 to countw(&list1,' ');
if findw(&list2,scan(&list1,i,' '))
then matches = matches + 1;
end;
if matches eq countw(&list1,' ') then match_type = 'Full';
else if matches eq 0 then match_type = 'None';
else match_type = 'Partial';
%mend list_overlap;
Something like that, I can't test it right now, but that should generally work. Then don't call execute the macro, just call it normally.
data matches;
set use;
%list_overlap(list1,list2);
run;
I'm SAS user.
I want to assign year columns using date values.
for example, here is my code, below.
I want to make Y_2010, Y_2011, Y_2012 , Y_2013, Y_2014 in work.total data set.
but there is only Y_2014 as a result.
How can I change the code as I can get right result which I intended first?
options mcompilenote = all;
%let a = Y_ ;
%macro B(YMIN, YMAX) ;
%do i = &YMIN %to &YMAX ;
DATA TOTAL ;
SET SASUSER.EMPDATA ;
IF YEAR(HIREDATE) = &i THEN &a&i = 1 ;
ELSE &a&i = 0 ;
RUN;
%end;
%mend;
%B (2010, 2014) ;
Because you are repeatedly re-creating the output dataset only the final version is available. To fix the macro move the %DO loop inside the DATA step so that you are generating all of the variables in a single data step.
%macro B(YMIN, YMAX) ;
DATA TOTAL ;
SET SASUSER.EMPDATA ;
%do i = &YMIN %to &YMAX ;
IF YEAR(HIREDATE) = &i THEN &a&i = 1 ;
ELSE &a&i = 0 ;
%end;
RUN;
%mend;
But there is no need to a macro for this. Just use normal SAS statements. For example you could use an ARRAY statement to define the variables and then loop over the array and set the values. Note that the result of a boolean expression in SAS is 0 when false and 1 when true so you can eliminate the IF/THEN/ELSE statement and just use an assignment statement.
DATA TOTAL ;
SET SASUSER.EMPDATA ;
array &a &a&ymin - &a&ymax;
do i=&ymin to &ymax ;
&a[i-&ymin+1] = (year(hiredata)=i);
end;
drop i;
RUN;
I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;
I'm trying to write robust code to assign values to macro variables. I want the names of the macro variables to depend on values coming from the variable 'subgroup'. So subgroup could equal 1, 2, or 45 etc. and thus have macro variable names trta_1, trta_2, trt_45 etc.
Where I am having difficulty is calling the macro variable name. So instead of calling e.g. &trta_1 I want to call &trta_%SCAN(&subgroups, &k), which resolves to trta_1 on the first iteration. I've used a %SCAN function in the macro variable name, which is throwing up a warning 'WARNING: Apparent symbolic reference TRTA_ not resolved.'. However, the macro variables have been created with values assigned.
How can I resolve the warning? Is there a function I could run with the %SCAN function to get this to work?
data data1 ;
input subgroup trta trtb ;
datalines ;
1 30 58
2 120 450
3 670 3
run;
%LET subgroups = 1 2 3 ;
%PUT &subgroups;
%MACRO test;
%DO k=1 %TO 3;
DATA test_&k;
SET data1;
WHERE subgroup = %SCAN(&subgroups, &k);
CALL SYMPUTX("TRTA_%SCAN(&subgroups, &k)", trta, 'G');
CALL SYMPUTX("TRTB_%SCAN(&subgroups, &k)", trtb, 'G');
RUN;
%PUT "&TRTA_%SCAN(&subgroups, &k)" "&TRTB_%SCAN(&subgroups, &k)";
%END;
%MEND test;
%test;
Using the structure you've provided the following will achieve the result you're looking for.
data data1;
input subgroup trta trtb;
datalines;
1 30 58
2 120 450
3 670 3
;
run;
%LET SUBGROUPS = 1 2 3;
%PUT &SUBGROUPS;
%MACRO TEST;
%DO K=1 %TO 3;
%LET X = %SCAN(&SUBGROUPS, &K) ;
data test_&k;
set data1;
where subgroup = &X ;
call symputx(cats("TRTA_",&X), trta, 'g');
call symputx(cats("TRTB_",&X), trtb, 'g');
run;
%PUT "&&TRTA_&X" "&&TRTB_&X";
%END;
%MEND TEST;
%TEST;
However, I'm not sure this approach is particularly robust. If your list of subgroups changes you'd need to change the 'K' loop manually, you can determine the upper bound of the loop by dynamically counting the 'elements' in your subgroup list.
If you want to call the macro variables you've created later in your code, you could a similar method.
data data2;
input subgroup value;
datalines;
1 20
2 25
3 15
45 30
;
run ;
%MACRO TEST2;
%DO K=1 %TO 3;
%LET X = %SCAN(&SUBGROUPS, &K) ;
data data2 ;
set data2 ;
if subgroup = &X then percent = value/&&TRTB_&X ;
format percent percent9.2 ;
run ;
%END;
%MEND TEST2;
%TEST2 ;
Effectively, you're re-writing data2 on each iteration of the loop.
This should cover your requirements. You can load and unload an array of macro variable without a macro. I have included an alternate method of unloading a macro variable array with a macro for comparison.
Load values into macro variables including Subgroup number within macro variable name e.g. TRTA_45.
data data1;
input subgroup trta trtb;
call symput ('TRTA_'||compress (subgroup), trta);
call symput ('TRTB_'||compress (subgroup), trtb);
datalines;
1 30 58
2 120 450
3 670 3
45 999 111
;
run;
No need for macro to load or refer to macro variables.
%put TRTA_45: &TRTA_45.;
%let Subgroup_num = 45;
%put TRTB__&subgroup_num.: &&TRTB_&subgroup_num.;
If you need to loop through the macro variables then you can use Proc SQL to generate a list of subgroups.
proc sql noprint;
select subgroup
, count (*)
into :subgroups separated by ' '
, :No_Subgroups
from data1
;
quit;
%put Subgroups: &subgroups.;
%put No_Subgroups: &No_Subgroups.;
Use a macro to loop through the macro variable array and populate a table.
%macro subgroups;
data subgroup_data_macro;
%do i = 1 %to &no_subgroups.;
%PUT TRTA_%SCAN(&subgroups, &i ): %cmpres(&TRTA_%SCAN(&subgroups, &i ));
%PUT TRTB_%SCAN(&subgroups, &i ): %cmpres(&TRTB_%SCAN(&subgroups, &i ));
subgroup = %SCAN(&subgroups, &i );
TRTA = %cmpres(&TRTA_%SCAN(&subgroups, &i ));
TRTB = %cmpres(&TRTB_%SCAN(&subgroups, &i ));
output;
%end;
run;
%mend subgroups;
%subgroups;
Or use a data step (outside a macro) to loop through the macro variable array and populate a table.
data subgroup_data_sans_macro;
do i = 1 to &no_subgroups.;
subgroup = SCAN("&subgroups", i );
TRTA = input (symget (compress ('TRTA_'||subgroup)),20.);
TRTB = input (symget (compress ('TRTB_'||subgroup)),20.);
output;
end;
run;
Ensure both methods (within and without a macro) produce the same result.
proc compare
base = subgroup_data_sans_macro
compare = subgroup_data_macro
;
run;
I want to use a Macro call in a data step. Below is the macro and its invocation in a data step but this isnt working. Can you guys please suggest a way to make it work.
%macro xscan(string, delimiter, word_number);
%let len1=%length(&string); /*Computing the length of the string*/
%let len=%eval(&len1+1);
%let sub=%scan(&string,&word_number,"&delimiter");
%if &word_number ge 0 %then %do;
%let pos=%index(&string,&sub); /* Locate the position while reading left to right*/
%end;
%if &word_number lt 0 %then %do;
data _null_;
pos=find("&string","&sub",-&len);
call symput("pos",pos);
run;
%end;
%let strg=%substr(&string,&pos); /* Extract the substring*/
%put the string is &strg;
%mend;
data work.in_data;
length in_string $50;
in_string = “a bb ccc dddd bb eeeee”;
output;
in_string = “aa b cc aa dee”;
output;
run;
data work.out_data;
set work.in_data;
length sub_str $50;
start_word_num = -(_n_ +1);
sub_str = %xscan(in_string,’ ‘, start_word_num);
run;
proc print; run;
If the macro is to be used inside a datastep, write it more simply using just datastep functions instead of making it complicated with macro functions. There are plenty of SAS string functions which will allow you to accomplish what it appears you want in far less code.
Data steps are very flexible and they allow you to manipulate your data in a very flexible day, I would recommend you trying to refactor the code to use only a datastep, however, if you still want to use it the way you have it right now use call execute .
Here is an example:
data _null_ ;
input name $ value !$ ;
call execute
( ‘%global ‘
llname~l ‘;’ II
‘%let’ Ilnamell =’ [1
value II ‘;’
);
cards ;
abc xyz
;
Refer to these docs for further reading :
http://www2.sas.com/proceedings/sugi30/154-30.pdf and
http://www2.sas.com/proceedings/sugi22/CODERS/PAPER70.PDF