Unable to apply a SAS macro to a column - sas

I want to apply a macro I have written to each individual row in SAS
DATA cars1;
INPUT make $ model $ mpg weight price;
CARDS;
AMC Concord 22 2930 4099
AMC Pacer 17 3350 4749
AMC Spirit 22 2640 3799
Buick Century 20 3250 4816
Buick Electra 15 4080 7827
;
RUN;
%macro calculate1 (var_name, var_value);
%If &var_name < 20 %then
%do;
&var_value + &var_name;
%end;
%else %if &var_name >= 20 %then
%do;
&var_value - &var_name;
%end;
%mend ;
Data cars2; Set cars1;
varnew = %calculate1(mpg, weight);
Run;
When I run this code I get a difference between the two columns even when the MPG values are <20 when according to the code I want the sum of the columns if the value in the MPG column is < 20.
I know I could use If conditions using the columns but I want try to use macros to do this.
Please help me apply my macro on the columns.
Thanks in advance.

You most likely do not need to macro code yet.
Macro writes SAS source code before run-time, it does not evaluate data step expressions at runtime.
Learn to write DATA Step code before attempting to abstract it to macro.
DATA Step
This might contain source code statements (the if/then/else) that you want macro to generate
data cars2;
* calculate;
if mpg < 20 then varnew = weight + mpg;
else
if mpg >= 20 then varnew = weight - mpg;
run;
How would this be abstracted ? Determine which components of the if/then/else would be reused in a different context or with different variables. If you can't determine re-use, don't code a macro.
Consider abstraction #1 (as pseudo-code) that is to generate a complete statement
if PARAMETER_2 < 20 then RESULT_VAR = PARAMETER_1 + PARAMETER_2;
else
if PARAMETER_2 >= 20 then RESULT_VAR = PARAMETER_1 - PARAMETER_2;
or abstraction #2 that is to generate source code for an expression
ifn (PARAMETER_2 < 20, PARAMETER_1 + PARAMETER_2, PARAMETER_1 - PARAMETER_2)
But why hardcode 20 into the macro ? Why not make that a parameter as well ? If you go that route the abstraction is too much and is templating actual language elements that should be used in a non-macro way. (One might suppose in a purely functional language such as LISP there is no abstraction too much)
Abstraction #1 as macro
%macro calculate(result_var, parameter_1, parameter_2);
/* generate DATA Step source code, using the passed parameters */
if &PARAMETER_2 < 20 then &RESULT_VAR = &PARAMETER_1 + &PARAMETER_2;
else
if &PARAMETER_2 >= 20 then &RESULT_VAR = &PARAMETER_1 - &PARAMETER_2;
%mend;
data cars2;
%calculate(varnew,weight,mpg);
run;
Abstraction #2 as macro
%macro calculate(parameter_1, parameter_2);
/* generate source code that is valid as right hand side of assignment */
ifn (&PARAMETER_2 < 20, &PARAMETER_1 + &PARAMETER_2, &PARAMETER_1 - &PARAMETER_2)
%mend;
data cars2;
varnew = %calculate(weight,mpg);
run;

Related

Finding specific values for all variables in a table using SAS EG

I have a table which contains one key id and 100 variables (x1, x2, x3 ..... x100) and i need to check every variables if there are any values stored as -9999, -8888, -7777, -6666 in of them.
For one variable i use
proc sql;
select keyid, x1
from mytable
where x1 in(-9999,-8888,-7777,-6666);
quit;
This is the data i am trying to get but it is just for one variable.
I do not have time for copying and pasting all the variables (100 times) in this basic query.
I have searched the forum but the answers i have found are a bit far from what i actually need
and since i am new to SAS i can not write a macro.
Can you help me please?
Thanks.
Try this. Just made up some sample data that resembles what you describe :-)
data have;
do key = 1 to 1e5;
array x x1 - x100;
do over x;
x = rand('integer', -10000, -5000);
end;
output;
end;
run;
data want;
set have;
array x x1 - x100;
do over x;
if x in (-9999, -8888, -7777, -6666) then do;
output;
leave;
end;
end;
run;
Don't use SQL. Instead use normal SAS code so you can take advantage of SAS syntax like ARRAYs and variable lists.
So make an array containing the variable you want to look at. Then loop over the array. There is no need to keep looking once you find one.
data want;
set mytable;
array list var1 varb another_var x1-x10 Z: ;
found=0;
do index=1 to dim(list) until (found);
found = ( list[index] in (-9999 -8888 -7777 -6666) );
end;
if found;
run;
And if you want to search all of the numeric variables you can even use the special variable list _NUMERIC_ when defining the array:
array list _numeric_;
thank you for your help i have found a solution and wanted to share it with you.
It has some points that needs to be evaluated but it is fine for me now. (gets the job done)
`%LET LIB = 'LIBRARY';
%LET MEM = 'GIVENTABLE';
%PUT &LIB &MEM;
PROC SQL;
SELECT
NAME INTO :VARLIST SEPARATED BY ' '
FROM DICTIONARY.COLUMNS
WHERE
LIBNAME=&LIB
AND
MEMNAME=&MEM
AND
TYPE='num';
QUIT;
%PUT &VARLIST;
%MACRO COUNTS(INPUT);
%LOCAL i NEXT_VAR;
%DO i=1 %TO %SYSFUNC(COUNTW(&VARLIST));
%LET NEXT_VAR = %SCAN(&VARLIST, &i);
PROC SQL;
CREATE TABLE &NEXT_VAR AS
SELECT
COUNT(ID) AS NUMBEROFDESIREDVALUES
FROM &INPUT
WHERE
&NEXT_VAR IN (6666, 7777, 8888, 9999)
GROUP BY
&NEXT_VAR;
QUIT;
%END;
%MEND;
%COUNTS(GIVENTABLE);`
The answer you provided to your own question gives more insight to what you really wanted. However, the solution you offered while it works is not very efficient. The SQL statement runs 100 times for each variable in the source data. That means the source table is read 100 times. Another problem is that it creates 100 output tables. Why?
A better solution is to create 1 table that contains the counts for each of the 100 variables. Even better is to do it in 1 pass of the source data instead of 100.
data sum;
set have end=eof;
array x(*) x:;
array csum(100) _temporary_;
do i = 1 to dim(x);
x(i) = (x(i) in (-9999, -8888, -7777, -6666)); * flag (0 or 1) those meeting criteria;
csum(i) + x(i); * cumulative count;
if eof then do;
x(i) = csum(i); * move the final total to the orig variable;
end;
end;
if eof then output; * only output the final obs which has the totals;
drop key i;
run;
Partial result:
x1 x2 x3 x4 x5 x6 x7 x8 ...
90 84 88 85 81 83 59 71 ...
You can keep it in that form or you can transpose it.
proc transpose data=sum out=want (rename=(col1=counts))
name=variable;
run;
Partial result:
variable counts
x1 90
x2 84
x3 88
x4 85
x5 81
... ...

Big number anomaly in SAS

Do somebody know why the number stocked in "numero" isn't the same that the one I put in the let ?
I use SAS Enterprise Guide 7.1.
Here's my program :
%let ident = 4644968792486317489 ;
data _null_ ;
numero= put(&ident.,z19.);
call symputx('numero',numero);
run;
%put &numero. ;
And the log :
30 %let ident = 4644968792486317489 ;
31
32 data _null_ ;
33 numero= put(&ident.,z19.);
34 call symputx('numero',numero);
35 run;
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
36
37 %put &numero. ;
4644968792486317056
Thanks by advance !
SAS stores numbers as 8 byte floating point values. Therefore there is a limit to the maximum integer that can be stored exactly (or really exactly without gaps). They even publish a table with the maximum value.
And a function you can use to determine the maximum value.
3 %put %sysfunc(constant(exactint),comma23.);
9,007,199,254,740,992
Looks like your "number" is really an identifier. So store it as character to begin with and you will not have these problems.
data want;
length numero $19;
numero = "&ident";
numero = translate(right(numero),'0',' ');
run;
Use the SAS MD5 function to anonymize strings. Don't forget MACRO is really just text processing.
%let ident = 4644968792486317489 ;
%let numero = %sysfunc(MD5(&ident));
or in DATA Step
data ... ;
numero = MD5("&ident");
In certain situations you might associate a monotonic serial value to an identity value.
%let ident = 4644968792486317489 ;
%if not %symexist(i&ident) %then %do;
%let i&ident = %sysfunc(monotonic());
%put new serial;
%end;
%put i&ident=&&i&ident;
----- LOG -----
i4644968792486317489=1

put values to a file using functions without creating new variables

I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?
Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988
The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_
I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.

How to use arithmetic operators with SAS macro variables [duplicate]

This question already has an answer here:
Prompt or Macro Variables used in calculations
(1 answer)
Closed 4 years ago.
I have a data set like this;
DATA work.faminc;
INPUT famid faminc1-faminc12 ;
CARDS;
1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818
2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471
3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215
;
RUN;
I can create a variable and do some stuff with it like,
%let N=12;
DATA faminc1b;
SET faminc ;
ARRAY Afaminc(12) faminc1-faminc12 ;
ARRAY Ataxinc(&N) taxinc1-taxinc&N ;
DO month = 1 TO &N;
Ataxinc(month) = Afaminc(month) * .10 ;
END;
RUN;
But I also want to divide every family income to the one before it.
The result should be like faminc1/faminc2 - faminc2/faminc3 - faminc3/faminc4...
So main problem is how to use arithmetic (+,-,*,/) operators to the "N" variable which i have created.
When I tried to simply do this, it doesnt work;
%let N=12;
DATA faminc1b;
SET faminc ;
ARRAY Afaminc(12) faminc1-faminc12 ;
ARRAY Afamdiv(&N) famdiv1-famdiv&N ;
DO month = 1 TO &N+1;
Afamdiv(month) = faminc&N/faminc&N+1 ;
END;
RUN;
Thanks for the help.
I am not exactly sure what you want to achieve, so i can only answer your question regarding an operation on a macrovariable, to get your sample working you should put it in a seperate macro, then you can do the eval function on your macrovariable to add 1.
But as far as i can see, you must use month as your loopingvariable and not N, also you have to stop at 11, because you dont have a variable 13 to divide with variable 12.
%let N=12;
%macro calc;
DATA faminc1b;
SET faminc ;
ARRAY Afaminc(12) faminc1-faminc12 ;
ARRAY Afamdiv(&N) famdiv1-famdiv&N ;
%DO month = 1 %TO %eval(&N-1);
Afamdiv(&month) = faminc&month/faminc%eval(&month+1) ;
%END;
RUN;
%mend;
%calc;
You do not need to use the macro variable for anything other than to define the upper bound on your varaible list.
Everything else you can do with normal SAS code. Use the DIM() function to find the upper bound arrays. Use the arrays in your calculations. Not sure why you are hardcoding one upper bound and using the macro variable for the other, but if they can be different then you need to consider the length of both arrays to find upper bound for your DO loop.
%let N=12;
DATA faminc1b;
SET faminc ;
ARRAY Afaminc faminc1-faminc12 ;
ARRAY Afamdiv famdiv1-famdiv&N ;
DO month = 1 TO min(dim(afaminc)-1,dim(afamdiv));
Afamdiv(month) = afaminc(month)/afaminc(month+1) ;
END;
RUN;

Find three most recent data year for each row

I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;