merge 6000 variables with the merge command line - sas

I've got the following issue and I'm not sure on how to do this.
I'm trying to merge 6000 variables through the code below
Please find below the piece of code I've written for two of the variables
data big_aat_1;
merge Aat_1(rename=(var14=var14_t0 var28=var28_t_0))
Aat_2(rename=(var14=var14_t_1 var28=var28_t_1))
Aat_3(rename=(var14=var14_t_2 var28=var28_t_2))
Aat_4(rename=(var14=var14_t_3 var28=var28_t_3))
Aat_5(rename=(var14=var14_t_4 var28=var28_t_4))
Aat_6(rename=(var14=var14_t_5 var28=var28_t_5));
by nouv_date;
run;
My aim is to try to automate my piece of code for the 6000 variables I have and keep the way I'm doing it e.g. with the merge.
The result will all the variables would be like the one below. The ...represent the rest of the variables
data big_aat_1;
merge Aat_1(rename=(var14=var14_t0 var28=var28_t_0 var37=var37_t_0 ...))
Aat_2(rename=(var14=var14_t_1 var28=var28_t_1 var37=var37_t_1 ...))
Aat_3(rename=(var14=var14_t_2 var28=var28_t_2 var37=var37_t_2 ...))
Aat_4(rename=(var14=var14_t_3 var28=var28_t_3 var37=var37_t_3 ...))
Aat_5(rename=(var14=var14_t_4 var28=var28_t_4 var37=var37_t_4 ...))
Aat_6(rename=(var14=var14_t_5 var28=var28_t_5 var37=var37_t_5 ...));
by nouv_date;
run;
There are 2 things I need to state
1) I have a dataset / table that contains all the distinct variable names (e.g. var14, var28 ...). It would be great if I can use it. The name of the dataset is dicoAg
2) I need to keep the merge for some reasons I cannot talk about here.
If you have any insight

I started creating test data sets (you obviously already have them):
%MACRO P;
%DO I=1 %TO 6;
data aat_&I;
%DO J=1 %TO 6000;
var&J=&J;
%END;
nouv_date=1;output;
run;
%END;
%MEND;
%P;
and then I used proc contents to have a list of the variables (you can skip this step and use dicoAg):
proc contents data=aat_1 varnum out=vars;run;
and then you have sas write the rename code for you:
data _NULL_;
set vars /*dicoAg*/(where=(NAME^="nouv_date")) end=fine;
file "MyPath\Rename.sas";
if _N_=1 then do;
put '%MACRO RENAME(J=); ';
put '(rename=( ';
end;
/*intead of NAME use the variable in dicoAg which contains all the variables' names*/
put ' ' NAME '=' NAME +(-1) '_&J';
if fine then do;
put ' )) ';
put '%MEND; ';
end;
run;
you include the code:
%include "MyPath\Rename.sas";
and at the end you write the macro to do the merge:
%MACRO P;
data big_aat_1;
merge
%DO D=1 %TO 6;
aat_&D. %RENAME(J=&D)
%END;
;
by nouv_date;
run;
%MEND;
%P;

Everyone,
Without going into full details, my man-a and I did that
data big_aat_1;
merge %do j=1 %to 6 ; Aat_&j(rename=(%do i=1 %to &&&&nvar&&pays&l ; &&&&var&&pays&l.._&i=&&&&var&&pays&l.._&i.._t%eval(&j-1) %end ; )) %end ; ;
by nouv_date;
run;
Not perfect nor superbly efficient but doing the trick.
Explanation :
&&&&nvar&&pays&lis the max number of variables
&&&&var&&pays&l.._&iis the variable
The results will give you something like this
merge Aat_1(rename=( var1=var1_t0 var31=var31_t0 var60=var60_t0 var90=var90_t0 var119=var119_t0 ...
Aat_6(rename=( var1=var1_t5 var31=var31_t5 var60=var60_t5 var90=var90_t5 var119=var119_t5...
Best.

Related

running sas program for multiple parameters in loop

Very new to sas, need to perform export procedure for many datasets called data1, data2, data3 ... data10.
Is there a way to operationalize this? I've tried the following without success
LIBNAME input '/home/.../input';
LIBNAME output '/home/.../output';
%macro anynumber(number);
proc export data=input.data&number file="/home/.../output/data&number..dta" dbms=dta replace;
run;
%mend;
do i = 1 to 10;
%anynumber(i)
end;
run;
CALL EXECUTE is the recommended option but a macro loop is also a good option here.
You could loop it within the same macro as well. All options are illustrated below.
Usually when I see this type of coding though, my first thought is that someone isn't familiar with BY group processing.
data demo;
do i=1 to 10;
*make sure this string matches macro call;
str = catt('%anynumber(', i, ');');
*write records to log to check;
put str;
call execute(str);
end;
run;
Another option is a macro loop itself.
%macro loop_export(numLoops=10);
%do i=1 %to &numLoops;
%anywhere(&i);
%end;
%mend;
%loop_export(numLoops=10);
Putting them together:
%macro anynumber(number);
%do i=1 %to &number;
proc export data=input.data&number file="/home/.../output/data&i..dta" dbms=dta
replace;
run;
%end;
%mend;
*will export all 10;
%anyNumber(10);

Do loop for creating new variables in SAS

I am trying to run this code
data swati;
input facility_id$ loan_desc : $50. sys_name :$50.;
cards;
fac_001 term_loan RM_platform
fac_001 business_loan IQ_platform
fac_002 business_loan BUSES_termloan
fac_002 business_loan RM_platform
fac_003 overdrafts RM_platform
fac_003 RCF IQ_platform
fac_003 term_loan BUSES_termloan
;
proc contents data=swati out=contents(keep=name varnum);
run;
proc sort data=contents;
by varnum;
run;
data contents;
set contents ;
where varnum in (2,3);
run;
data contents;
set contents;
summary=catx('_',name, 'summ');
run;
data _null_;
set contents;
call symput ("name" || put(_n_ , 10. -L), name);
call symput ("summ" || put (_n_ , 10. -L), summary);
run;
options mlogic symbolgen mprint;
%macro swati;
%do i = 1 %to 2;
proc sort data=swati;
by facility_id &&name&i.;
run;
data swati1;
set swati;
by facility_id &&name&i.;
length &&summ&i. $50.;
retain &&summ&i.;
if first.facility_id then do;
&&summ&i.="";
end;
if first.&&name&i. = last.&&name&i. then &&summ&i.=catx(',',&&name&i., &&summ&i.);
else if first.&&name&i. ne last.&&name&i. then &&summ&i.=&&name&i.;
run;
if last.facility_id ;
%end;
%mend;
%swati;
This code will create two new variables loan_desc_summ and sys_name_summ which has values of the all the loans_desc in one line and the sys_names in one line seprated by comma example (term_loan, business_loan), (RM_platform, IQ_platform) But if a customer has only one loan_desc the loan_summ should only have its value twice.
The problem while running the do loop is that after running this code, I am getting the dataset with only the sys_name_summ and not the loan_desc_summ. I want the dataset with all the five variables facility_id, loan_desc, sys_name, loan_desc_summ, sys_name_summ.
Could you please help me in finding out if there is a problem in the do loop??
Your loop is always starting with the same input dataset (swati) and generating a new dataset (SWATI1). So only the last time through the loop has any effect. Each loop would need to start with the output of the previous run.
You also need to fix your logic for eliminating the duplicates.
For example you could change the macro to:
%macro swati;
data swati1;
set swati;
run;
%do i = 1 %to 2;
proc sort data=swati1;
by facility_id &&name&i.;
run;
data swati1;
set swati1;
by facility_id &&name&i ;
length &&summ&i $500 ;
if first.facility_id then &&summ&i = ' ' ;
if first.&&name&i then catx(',',&&summ&i,&&name&i);
if last.facility_id ;
run;
%end;
%mend;
Also your program could be a lot smaller if you just used arrays.
data want ;
set have ;
by facility_id ;
array one loan_desc sys_name ;
array two $500 loan_desc_summ sys_name_summ ;
retain loan_desc_summ sys_name_summ ;
do i=1 to dim(one);
if first.facility_id then two(i)=one(i) ;
else if not findw(two(i),one(i),',','t') then two(i)=catx(',',two(i),one(i));
end;
if last.facility_id;
drop i loan_desc sys_name ;
run;
If you want to make it more flexible you can put the list of variable names into a macro variable.
%let varlist=loan_desc sys_name;
You could then generate the list of new names easily.
%let varlist2=%sysfunc(tranwrd(&varlist,%str( ),_summ%str( )))_summ ;
Then you can use the macro variables in the ARRAY, RETAIN and DROP statements.

Find three most recent data year for each row

I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;

Skip block of codes if macro variable is empty

I am creating a macro variable with the SAS code below. It's storing a list of data names where I need to replace certain values in specific variables.
proc sql noprint;
select distinct data_name
into :data_repl separated by ' '
from TP_attribute_matching
where Country="&Country_Name" and Replace_this ne ' ';
quit;
I would like to skip the following 2 blocks if data_repl is empty. These 2 blocks go through each data set and variables in that data set, and then replaces x with y.
/*Block 1*/
%do i=1 %to %_count_(word=&data_repl);
proc sql noprint;
select var_name,
Replace_this,
Replace_with
into :var_list_repl_&i. separated by ' ',
:repl_this_list_&i. separated by '#',
:repl_with_list_&i. separated by '#'
from TP_attribute_matching
where Replace_this ne ' ' and data_name="%scan(&data_repl,&i.)";
quit;
/* Block 2 */
%do i=1 %to %_count_(word=&data_repl);
data sasdata.%scan(&data_repl,&i);
set sasdata.%scan(&data_repl,&i);
%do j=1 %to %_count_(word=&&var_list_repl_&i.);
%let from=%scan("&&repl_this_list_&i.",&j,'#');
%let to=%scan("&&repl_with_list_&i.",&j,'#');
%scan(&&var_list_repl_&i.,&j)=translate(%scan(&&var_list_repl_&i.,&j),&to,&from);
%end;
run;
%end;
How shoould I do this? I was going through %SKIP and if then leave, but cannot figure this out yet.
%IF and %DO are macro statements that can only be used inside a macro:
%macro DoSomething;
%if "&data_repl" ne "" %then %do;
/*Block 1*/
%do i=1 %to %_count_(word=&data_repl);
proc sql noprint;
select var_name,
Replace_this,
Replace_with
into :var_list_repl_&i. separated by ' ',
:repl_this_list_&i. separated by '#',
:repl_with_list_&i. separated by '#'
from TP_attribute_matching
where Replace_this ne ' ' and data_name="%scan(&data_repl,&i.)";
quit;
/* Block 2 */
%do i=1 %to %_count_(word=&data_repl);
data sasdata.%scan(&data_repl,&i);
set sasdata.%scan(&data_repl,&i);
%do j=1 %to %_count_(word=&&var_list_repl_&i.);
%let from=%scan("&&repl_this_list_&i.",&j,'#');
%let to=%scan("&&repl_with_list_&i.",&j,'#');
%scan(&&var_list_repl_&i.,&j)=translate(%scan(&&var_list_repl_&i.,&j),&to,&from);
%end;
run;
%end;
%end;
%mend;
%DoSomething
EDIT:
Instead of checking the string, you can use count from PROC SQL (&SQLOBS macro var)
%let SQLOBS=0; /* reset SQLOBS */
%let data_repl=; /* initialize data_repl,
would not be defined in case when no rows returned */
proc sql noprint;
select distinct data_name
into :data_repl separated by ' '
from TP_attribute_matching
where Country="&Country_Name" and Replace_this ne ' '
and not missing(data_name);
quit;
%let my_count = &SQLOBS; /* keep the record count from last PROC SQL */
...
%if &my_count gt 0 %then %do;
...
...
%end;
If you already have a main macro, no need to define new (I'm not sure what you're asking now).
First off, this is yet another good example where list processing basics would simplify the code to where you don't need to worry about your actual question. Will elaborate later.
Second off, the way these loops are usually coded is something like
%do ... %while &macrovar ne ;
which checks for empty and doesn't execute the loop at all if it's empty to start with. &macrovar there would be the result of the scan. IE:
%let scan_result = %scan(&Data_repl.,1);
%do i = 1 %to %_count_... while &scan_result ne ; *perhaps minus one, not sure what %_count_() does exactly;
... code
%let scan_result=%scan(&data_Repl.,&i+1);
%end;
Going back to list processing, what you're ultimately doing is:
data &dataset.;
set &dataset.;
[for some set of &variables,&tos, &froms]
&variable. = translate(&variable.,&to.,&from.);
[/set of variables]
run;
So what you need is a couple of macros. Assuming you have a dataset with
<dataset> <varname> <to> <from>
You can call this pretty easily. Two ways:
Run it as a set of nested macros/calls. This is a bit messier, but might be a bit easier to understand.
%macro do_dataset(data=);
proc sql noprint;
select cats('%convert_Var(var=',varname,',to=',to,',from=',from,')')
into :convertlist separated by ' '
from dataset_with_conversions
where dataset="&data.";
quit;
data &data;
set &data;
&convertlist.;
run;
%mend do_dataset;
%macro convert_var(var=,to=,from=);
&var. = translate(&var.,"&to.","&from.");
%mend convert_var;
proc sql noprint;
select cats('%do_dataset(data=',dataset,')')
into :dslist separated by ' '
from dataset_with_conversions;
quit;
&dslist;
Second, you can do all of that in one datastep using call execute (rather than having two different steps). IE, do a by dataset statement, then for first.dataset execute data <dataset>; (filling in that) and for last.dataset execute run, and otherwise execute the translates.
More complicated, but one pass solution - depends on your comfort level which you prefer, they should generally work similarly.
if you want to skip something based on the parameter, if data_repl is set as null, you can add a check for the value, it will avoid error causing during the include statement, since at that time this will be null and which may cause error. E.g
if libary path is derived based on variable passed. which will lead to invalid library path during the include statement, We can use the skip statement.
%macro DoSomething(data_repl=);
%if "&data_repl" ne "" %then %do;
// your code goes here.
%end;
%mend;
%DoSomething

sas macro call macro

Hi I have a program to use one macro to call another one.
I have two month(jun12 and jul12) and each month has two parts(1 & 2), I want to do a loop which I construct a macro called"Loop", Inside it, I constructed a Array, and used Do comment do call a macro "try".
Seems like it doesn't work. Can someone help me with it? Thank you!
LIBNAME EC100006 "G:\sample";
%MACRO try(month=,part=);
...FROM EC100006.monthitsum&month.lag&part AS t1
%MEND try;
%Macro test;
ARRAY Mon(2) jun12 jul12;
%Do i=1 %to 2;
%Do j=1 %to 2
%try(month=Mon(i),part=j)
%End
%End
%Mend test;
%test
You can't have an array of macro variables.
The simplest way to repeatedly call a macro with a list of parameters is to make a dataset with those parameters and call it from the dataset, either with CALL EXECUTE or using PROC SQL to create a macro list of macro calls.
data call;
input month $ part;
datalines;
jun12 1
jul12 2
;;;;
run;
proc sql;
select cats('%try(month=',month,',part=',part,')') into :mcalllist
separated by ' '
from call;
quit;
&mcalllist;
That only works if you have less than 20,000 characters worth of calls or so - if it's more than that you need to try a different option (%include file or call execute).
So right now it's something like this
LIBNAME EC100006 "G:\sample";
%MACRO try(month=,part=);
...FROM EC100006.monthitsum&month.lag&part AS t1
%MEND try;
Data Array
ARRAY Mon{2} jun12 jul12;
RUN;
%Macro test;
%Do i=1 %to 2;
%Do j=1 %to 2
%try(month=Mon(i),part=j)
%End
%End
%Mend test;
%test