Related
I have a dataset with each observation having two space-separated lists as string variables. I want a third variable showing the overlap between the string lists. Using another SO post, I've created a macro to calculate the overlap. I can't work out how to implement it in a DATA step to get the third variable.
This is my dataset, with dummy data:
data use;
infile datalines dlm='~~';
input list1:$100. list2:$100. expected_match:$10.;
datalines;
Homer Bart~~Homer Bart~~Full
Marge Lisa~~Lisa Marge~~Full
Homer Marge~~Marge~~Partial
Bart Lisa~~Bart~~Partial
Homer Marge Bart Lisa~~Maggie~~None
;;;;
run;
This is the macro, with tests (all of which pass):
%macro list_overlap(list1, list2);
%local i matches match_type;
%let matches = 0;
%do i = 1 %to %sysfunc(countw(&list1, %str( )));
%if %sysfunc(findw(&list2, %scan(&list1, &i,, s)))
%then %let matches = %eval(&matches + 1);
%end;
%if &matches = %sysfunc(countw(&list1, %str( )))
and %sysfunc(countw(&list1, %str( ))) = %sysfunc(countw(&list2, %str( )))
%then %let match_type = 'Full';
%else %if &matches = 0 %then %let match_type = 'None';
%else %let match_type = 'Partial';
match_type = &match_type%str(;)
%mend list_overlap;
%put NOTE: %list_overlap(Homer Bart,Homer Bart);
%put NOTE: %list_overlap(Marge Lisa,Lisa Marge);
%put NOTE: %list_overlap(Homer Marge,Marge);
%put NOTE: %list_overlap(Bart Lisa,Bart);
%put NOTE: %list_overlap(Homer Marge Bart List,Maggie);
This is how I'm trying to implement it in a DATA step:
data matches;
set use;
call execute(catt('%list_overlap(', list1, ',', list2, ')'));
run;
I'm getting the following error with this case:
NOTE: Line generated by the CALL EXECUTE routine.
1 + match_type = 'Full';
__________
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
I've tried other ways too, but this is the closest I've got.
Looks like you want the RESOLVE() function instead of CALL EXECUTE.
data matches;
set use;
match_type = resolve(cats('%list_overlap(',list1,',',list2,')'));
run;
But with your current definition of the macro that will include all of characters the macro generates, match_type = 'Full';, into the value of the MATCH_TYPE variable. So remove the superfluous characters the macro is currently generating so that it only generates the value you want to save.
... %then Full;
%else %if matches eq 0 %then None;
%else Partial;
Your problem here is that call execute isn't doing what you think, I suspect.
What's happening:
Data step runs, call execute lines generated
Then the macro stuff is call executed, and you have:
Code example:
data matches;
set use;
call execute(stuff);
run;
match_type = 'Full';
That's not legal - that's a data step line but not in a data step.
Instead of doing all of that work in macro land, do it in data step land. Works just as well, and gets done what you want.
Something like this:
%macro list_overlap(list1,list2);
matches=0;
length match_type $7;
do i = 1 to countw(&list1,' ');
if findw(&list2,scan(&list1,i,' '))
then matches = matches + 1;
end;
if matches eq countw(&list1,' ') then match_type = 'Full';
else if matches eq 0 then match_type = 'None';
else match_type = 'Partial';
%mend list_overlap;
Something like that, I can't test it right now, but that should generally work. Then don't call execute the macro, just call it normally.
data matches;
set use;
%list_overlap(list1,list2);
run;
I have a lot of 6-digit numbers in a SAS program:
898300 898311 898312 898313 898314 898315 898316 898317 898321 898322 898323 898324 898331 898332 898333 898341 898342 898343
898400 898401 898402 898403 898500 898501 898502 898503 898600 898601 898602 898603 898604 898605 898606 898607 898608 898609
898610 898611 898612 898613 898614 898615 898616 898617 898700 898701 898702 898703 898704 898705 898706 898800 898801 898901
I would like to do a quick find and replace using Ctrl+H such that alle the 6-digit numbers are "quoted":
"898300" "898311" "898312" ...
etc.
I think doing a regular expression search is the way to go, but I am not able to identify the specific syntax. Anyone who knows what to do?
Thanks
Am sure this could be done in notepad (replace all multiple spaces with one, then replace a single space with " ") but seeing as you tagged SAS here is a SAS solution.
first, compile this macro:
/***
Converts a space delimited string into one with custom quotes / delimiters
#usage
%put %get_quoted_str(in_str=blah blah blah
,dlm=%str(,)
,quote=%str(%') );
returns: 'blah','blah','blah'
##
***/
%macro get_quoted_str(IN_STR=,DLM=,QUOTE=);
%local i item buffer;
%let i=1;
%do %while (%qscan(&IN_STR,&i,%str( )) ne %str() ) ;
%let item=%scan(&IN_STR,&i,%str( ));
%if %bquote("E) ne %then %let item="E%trim(&item)"E;
%else %let item=%trim(&item);
%if (&i = 1) %then %let buffer =&item;
%else %let buffer =&buffer&DLM&item;
%let i = %eval(&i+1);
%end;
&buffer
%mend;
then call as follows:
%put %get_quoted_str(IN_STR=898300 898311 898312 898313 898314 898315 898316 898317 898321 898322 898323 898324
898331 898332 898333 898341 898342 898343 898400 898401 898402 898403 898500 898501 898502 898503 898600 898601
898602 898603 898604 898605 898606 898607 898608 898609 898610 898611 898612 898613 898614 898615 898616 898617
898700 898701 898702 898703 898704 898705 898706 898800 898801 898901
,DLM=%str( ),QUOTE=%str(%")
);
which gives:
"898300" "898311" "898312" "898313" "898314" "898315" "898316" "898317" "898321" "898322" "898323" "898324" "898331" "898332"
"898333" "898341" "898342" "898343" "898400" "898401" "898402" "898403" "898500" "898501" "898502" "898503" "898600" "898601"
"898602" "898603" "898604" "898605" "898606" "898607" "898608" "898609" "898610" "898611" "898612" "898613" "898614" "898615"
"898616" "898617" "898700" "898701" "898702" "898703" "898704" "898705" "898706" "898800" "898801" "898901"
The above can then be copy pasted back into the program..
As long as the result is shorter than %SYSFUNC() limits I normally just use TRANWRD() function call for this. Compress multiple blanks to one first using COMPBL().
%let list=A B C D ;
%let qlist="%sysfunc(tranwrd(%sysfunc(compbl(&list)),%str( )," "))" ;
When calling CATT() function with %sysfunc, is there a way to stop it from evaluating an expression?
For example given the code:
%let date=10-13-2015;
%put %sysfunc(catt(The date Is:,&date));
I would like it to return:
The date Is:10-13-2015
Because 10-13-2015 is just a text string. But instead CATT() sees hyphen as a subtraction sign and evaluates it as a numeric expression, returning:
The date Is:-2018
I have tried macro quoting, but doesn't change anything, I suppose because I need to somehow hide the values from CATT(). Seems if any argument to CATT looks like an expression, it will be treated as such.
Another example:
%let value=2 and 3;
%put %sysfunc(catt(The value Is:,&value));
The value Is:1
Provided you can do so, just remove the comma - there's no need to separate it into an individual parameter (unless you're using catx() rather than catt():
%let date=10-13-2015;
%put %sysfunc(catt(The date Is: &date));
Personally, I think the best way to work is to store the date as a SAS date value and then use the second (optional) parameter of %sysfunc to apply the formatting. This provides better flexibility.
%let date = %sysfunc(mdy(10,13,2015));
%put The date Is: %sysfunc(sum(&date),mmddyyd10.);
If you are insistent on the original approach and are using catx(), then I don't know how to do it exactly. The closest I could get was to insert a piece of text so it couldn't be interpreted as an expression, and then remove that text afterwards using tranwrd. Pretty, ugly, and it leaves a space:
%let date=10-13-2015;
%let tmp=%sysfunc(catx(#, The date Is: , UNIQUE_STRING_TO_REMOVE&date ));
%let want=%sysfunc(tranwrd(&tmp, UNIQUE_STRING_TO_REMOVE, ));
%put &want;
Gives:
The date Is:# 10-13-2015
I also tried every combination of macro quoting, and scanned through the entire SAS function list and couldn't see any other viable options.
I don't see an easy way around this, unfortunately. I do see that you could in theory pass this through an FCMP function, though since FCMP doesn't allow true variable arguments, that isn't ideal either, but...
proc fcmp outlib=work.funcs.funcs;
function catme(delim $, in_string $) $;
length _result $1024;
length _new_delim $1;
_new_delim = scan(in_string,1,delim);
do _i = 1 to countc(in_string,delim);
_result = catx(_new_delim, _result, scan(in_string,_i+1,delim));
end;
return(_result);
endfunc;
quit;
options cmplib=work.funcs;
%let date=10-13-2015;
%put %sysfunc(catme(|,:|The date Is| &date.));
Or add quotes to the argument and then remove them after the CATx.
%sysfunc(dequote(%sysfunc(catt(.... ,"&date."))))
All messy.
The problem with %SYSFUNC() evaluating the arguments is not limited to the CAT() series of functions. Any function that accepts numeric values will result in SAS attempting to evaluate the expression provided.
This can be a useful feature. For example:
%let start_dt=10OCT2012 ;
%put %sysfunc(putn("&start_dt"d +1,date9));
You don't need to use CAT() functions to work with macro variables. Just expand the values next to each other and the are "concatenated".
%let date=10-13-2015;
%put The date Is:&date;
If you want to make a macro that works like the CATX() function then that is also not hard to do.
%macro catx /parmbuff ;
%local dlm return i ;
%if %length(&syspbuff) > 2 %then %do;
%let syspbuff = %qsubstr(&syspbuff,2,%length(&syspbuff)-2);
%let dlm=%qscan(&syspbuff,1,%str(,),q);
%let return=%qscan(&syspbuff,2,%str(,),q);
%do i=3 %to %sysfunc(countw(&syspbuff,%str(,),q));
%let return=&return.&dlm.%qscan(&syspbuff,&i,%str(,),q);
%end;
%end;
&return.
%mend catx;
%put %catx(|,a,b,c);
a|b|c
%put "%catx(",",a,b,c,d)";
"a","b","c","d"
Slightly less insane function-style macro without the dosubl:
%macro catx() /parmbuff;
%local rc dlm i params OUTSTR QWORD outstr;
%let SYSPBUFF = %qsubstr(&SYSPBUFF,2,%length(&SYSPBUFF)-2);
%let dlm = %qscan(&SYSPBUFF,1,%str(,));
%let params = %qsubstr(&SYSPBUFF,%index(&SYSPBUFF,%str(,))+1);
%let i = 1;
%let QWORD = %scan(&PARAMS,&i,%str(,));
%let OUTSTR = &QWORD;
%do %while(&QWORD ne);
%let i = %eval(&i + 1);
%let QWORD = %scan(&PARAMS,&i,%str(,));
%if &QWORD ne %then %let OUTSTR = &OUTSTR.&DLM.&QWORD;
%end;
%unquote(&OUTSTR)
%mend catx;
%put %catx(%str( ),abc,10 - 1 + 2,def);
Somewhat more insane but apparently working option - use %sysfunc(dosubl(...)) and lots of macro logic to create a function-style macro that takes input in the same way as %sysfunc(catx(...)), but forces catx to treat all input as text by quoting it and calling it in a data step.
%macro catxt() /parmbuff;
%local rc dlm i params QPARAMS QWORD outstr;
%let SYSPBUFF = %qsubstr(&SYSPBUFF,2,%length(&SYSPBUFF)-2);
%let dlm = %qscan(&SYSPBUFF,1,%str(,));
%let params = %qsubstr(&SYSPBUFF,%index(&SYSPBUFF,%str(,))+1);
%let i = 1;
%let QWORD = "%scan(&PARAMS,&i,%str(,))";
%let QPARAMS = &QWORD;
%do %while(&QWORD ne "");
%let i = %eval(&i + 1);
%let QWORD = "%scan(&PARAMS,&i,%str(,))";
%if &QWORD ne "" %then %let QPARAMS = &QPARAMS,&QWORD;
%end;
%let rc = %sysfunc(dosubl(%str(
data _null_;
call symput("OUTSTR",catx("&dlm",%unquote(&QPARAMS)));
run;
)));
&OUTSTR
%mend catxt;
%put %catxt(%str( ),abc,10 - 1 + 2,def);
Although this uses a data step to execute catx, dosubl allows the whole thing to be run in any place where you could normally use %sysfunc(catx(...)).
I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;
I want use a macro in a %let call, Below is the Macro code and how I want to invoke it. Please help me achieve it.
%macro xscan(string, delimiter, word_number);
%let len1=%length(&string); /*Computing the length of the string*/
%let len=%eval(&len1+1);
%let sub=%scan(&string,&word_number,"&delimiter"); /*Fetch the string specified by word_number*/
%if &word_number ge 0 %then %do;
%let pos=%index(&string,&sub); /* Locate the position while reading left to right*/
%end;
%if &word_number lt 0 %then %do;
data _null_;
pos=find("&string","&sub",-&len); /* Locate the position while reading from right to left*/
call symput("pos",pos);
run;
%end;
%let strg=%substr(&string,&pos); /* Extract the substring*/
%put the string is &strg;
%mend;
%let sub_str = %xscan(a bb ccc dddd bb eeeee, %str( ), -2);
%put The value of sub_str = &sub_str;
Desired implementation:
data work.in_data;
length in_string $50;
in_string = “a bb ccc dddd bb eeeee”;
output;
in_string = “aa b cc aa dee”;
output;
run;
data work.out_data;
set work.in_data;
length sub_str $50;
start_word_num = -(_n_ +1);
sub_str = %xscan(in_string,’ ‘, start_word_num);
run;
proc print; run;
I'm posting a new answer since the other answer answers a slightly different question.
Here, your macro really is intended to perform data step techniques, not macro techniques. You cannot (easily) use a macro to edit variable contents; a macro is intended to write SAS code, not to modify variables. You could use PROC FCMP to solve this problem, and I may well do so if I have more time, but for now here's the proper solution with just data step techniques and a normal (non-functional) macro.
First, write the data step technique to accomplish it. This is a fairly messy but effective solution. It only works for negative start_word_num; if left or right is desired it would need some modification to the loop parameters. I suggest using this as a starting point and improving it for your needs.
data work.out_data;
set work.in_data;
length sub_str $50;
start_word_num = -(_n_ +1);
do _t = countc(trimn(in_string),' ')+1 to countc(trimn(in_string),' ')+start_word_num+2 by -1;
sub_str = catx(' ',scan(in_string,_t,' '),sub_str);
put _t= sub_str=;
end;
put in_string= sub_str=;
run;
Now, move the loop into a macro.
%macro xscan(word_num, initial_string, result);
&result.=' ';
do _t = countc(trimn(&initial_string.),' ')+1 to countc(trimn(&initial_string.),' ')+&word_num.+2 by -1;
&result. = catx(' ',scan(&initial_string.,_t,' '),&result.);
end;
%mend xscan;
data work.out_data;
set work.in_data;
length sub_str $50;
start_word_num = -(_n_ +1);
%xscan(start_word_num,in_string,sub_str);
put in_string= sub_str=;
run;
You have two problems. First off, a function-style macro must not contain any data steps (or procs or anything else). If you do need to execute a data step, you have to use FCMP with run_macro. However, here you can use %SYSFUNC to accomplish what you are doing in the data step.
Second, you need to actually return the value. Ultimately a macro resolves to text, so you need to resolve
%let x = %xscan(...);
to
%let x = bb eeeee;
So you need to simply have bb eeeee as open text in your macro.
This should accomplish both things:
options mprint symbolgen;
%macro xscan(string, delimiter, word_number);
%local len1 len sub pos;
%let len1=%length(&string); /*Computing the length of the string*/
%let len=%eval(&len1+1);
%let sub=%scan(&string,&word_number,"&delimiter"); /*Fetch the string specified by word_number*/
%if &word_number ge 0 %then %do;
%let pos=%index(&string,&sub); /* Locate the position while reading left to right*/
%end;
%else %if &word_number lt 0 %then %do;
%let pos=%sysfunc(find(&string,&sub,-&len)); /* Locate the position while reading from right to left*/
%end;
%substr(&string,&pos) /* Extract the substring*/
%mend;
%let sub_str = %xscan(a bb ccc dddd bb eeeee, %str( ), -2);
%put The value of sub_str = &sub_str;
(Note, I don't necessarily know this does what you really want, but it does what the code appears to be doing.)
Some tips for function-style macros, courtesy of Rob Penridge:
Define all of your macro variables using a %local statement like so: %local len1 len sub pos;. That way you do not overwrite global macro variables.
Use /* THIS STYLE FOR COMMENTING */. Using other comment styles may cause the line to end.
The secret to making the macro work is the line that uses %substr at the end. This resolves to bb eeeeee being left in open code. Since that is all that is left, that is what calling the macro resolves to.
Do not put a semicolon on the line that is actually returned, as it may be undesirable when the function-style macro is used.