SAS Macro in macro - how change the process? - sas

You wrote that I shouldn't set %macro in %macro. so please help in my process.
Genesis: I have one table where is stock all SAS processes which starting day by day - but various of report should be run various day - according to field "Nr_week_day". If today is contained in this field - I put this process to general_stock table to run.
Below is my code with comment - what I'm trying to get.
Generally - this code works, but it with macro in macro. maybe you have better idea what can work it.
This process is related with my another "questions":
SAS include dynamic path
SAS Macro in macro
data CONTROL_FILES_BASE;
input Priority : 2.
ACTIVE: 1.
PROCES_NAME: $10.
Nr_week_day: $10.
; cards;
1 1 TEST_01 (1,3,6)
2 1 TEST_02 0
3 1 TEST_03 (4,5)
;
Data Kalendariusz;
infile cards dlm=',' dsd;
input ref_date: date9.
Nr_day_of_month
Nr_week_day
Number_date;
Format ref_date DDMMYY10.;
cards;
01NOV2018, 1, 4, 20181101
02NOV2018, 2, 5, 20181102
03NOV2018, 3, 6, 20181103
04NOV2018, 4, 7, 20181104
05NOV2018, 5, 1, 20181105
06NOV2018, 6, 2, 20181106
07NOV2018, 7, 3, 20181107
08NOV2018, 8, 4, 20181108
09NOV2018, 9, 5, 20181109
10NOV2018, 10, 6, 20181110
11NOV2018, 11, 7, 20181111
12NOV2018, 12, 1, 20181112
13NOV2018, 13, 2, 20181113
14NOV2018, 14, 3, 20181114
15NOV2018, 15, 4, 20181115
16NOV2018, 16, 5, 20181116
17NOV2018, 17, 6, 20181117
18NOV2018, 18, 7, 20181118
19NOV2018, 19, 1, 20181119
20NOV2018, 20, 2, 20181120
21NOV2018, 21, 3, 20181121
22NOV2018, 22, 4, 20181122
23NOV2018, 23, 5, 20181123
24NOV2018, 24, 6, 20181124
25NOV2018, 25, 7, 20181125
26NOV2018, 26, 1, 20181126
27NOV2018, 27, 2, 20181127
28NOV2018, 28, 3, 20181128
29NOV2018, 29, 4, 20181129
30NOV2018, 30, 5, 20181130
;
/*COMMENT: I TAKE TODAY IN VARIABLE*/
%LET EXTRACT_DATE_DT = date();
/*COMMENT: I CREATE EMPTY TABLE TO STOCK OF PROCESS*/
Proc sql;
Create table GENERAL_STOCK as
Select
*
FROM WORK.CONTROL_FILES_BASE
WHERE ACTIVE = 2
;quit;
/*START MAIN MACRO*/
%macro GENERATE_STOCK();
/*COMMENT: I check how many processes should be generated.*/
PROC SQL noprint;
Select
count(*) into :i
From work.CONTROL_FILES_BASE
WHERE Nr_week_day <> '0'
;quit;
%PUT &i;
/*COMMENT: I separated process which should be check*/
Proc sql;
Create table STOCK_2 as
Select
monotonic() as ROW_ID,
*
FROM work.CONTROL_FILES_BASE
WHERE Nr_week_day ne '0'
;quit;
/*MAIN LOOP - I take field NR_WEEK_DAY and will check that this number of day is today - row by row*/
%do ITER = 1 %To &i;
Proc sql;
Select
Nr_week_day into :SET_VAR
from STOCK_2
WHERE ROW_ID = &ITER
;quit;
%PUT &SET_VAR;
/*SET_VAR have value from Nr_week_day*/
%LET l_decision = 0;/*I set default value in variable*/
/*below code I found in forum - this macro reverse query - check whether (1,3,6) is included today - in table KALENDARIUSZ*/
%macro nos_obs(dsn=,where_stmt=);
proc sql;
select
count(*)
into :l_decision
from &dsn.
&where_stmt.
;quit;
%mend ;
%nos_obs(dsn=Kalendariusz,where_stmt=where Nr_week_day in &&SET_VAR. and Ref_date = &EXTRACT_DATE_DT.);
%PUT &l_decision;
/*IF ABOVE CODE RETURN 1 - means that the nr_week_day is today */
/*When l_decisions is 1 then process should add this row to general_stock. If 0 - should add nothing.*/
%if &l_decision = 1 %then
%do;
Proc sql;
Create table STOCK_2_INSERT (drop=ROW_ID) as
Select
*
FROM WORK.STOCK_2
WHERE ROW_ID = &ITER
;quit;
Proc sql;
insert into GENERAL_STOCK
select * from work.STOCK_2_INSERT
;quit;
/*I clear temp table*/
Proc sql;
delete FROM WORK.STOCK_2_INSERT
;quit;
%end;
%else %if &l_decision = 0 %then
%do;
%end;
%end;
%mend GENERATE_STOCK;
%GENERATE_STOCK();
/*AND I LOOK AT GENERAL TABLE*/
Proc sql;
Create table SHOW_GENERAL_STOCK as
Select
*
FROM WORK.GENERAL_STOCK
;quit;

As explained in the answers to your other question, it's a bad idea to define a macro within another macro definition. In your example, what that means is you can move the definition of the utility macro %nos_obs:
%macro nos_obs(dsn=,where_stmt=);
proc sql;
select
count(*)
into :l_decision
from &dsn.
&where_stmt.
;quit;
%mend ;
That block of code should not be inside the block:
%macro GENERATE_STOCK;
...
%mend GENERATE_STOCK;
You can still call %nos_obs from within %generate_stock. Just don't nest the macro definitions. So you end up with:
*define a macro;
%macro nos_obs(dsn=,where_stmt=);
...
%mend ;
*define a macro that does some stuff and invokes another macro;
%macro GENERATE_STOCK;
...
%nos_obs(dsn=...)
...
%mend GENERATE_STOCK;
%generatestock
That's the general point about not nesting macro definitions. To your big picture, it looks like you are writing a scheduler in SAS. Like linux cron or windows scheduler where you decide which programs to run based on the day of the week. Usually it is better to use a dedicated scheduler solution (cron, LSF, windows scheduler etc.) rather that write your own. Better means easier, more maintainable, more flexible, etc. They will let you manage dependencies, pause and restart, etc etc.
That said, if you do write your own scheduler in SAS (lots of people do, it's hard to resist the temptation sometimes), I think the code you have shown is much more complex than it needs to be.
You have a control dataset that lists the days on which each process should run:
data CONTROL_FILES_BASE;
input Priority : 2.
ACTIVE: 1.
PROCES_NAME: $10.
Nr_week_day: $10.
; cards;
1 1 TEST_01 (1,3,6)
2 1 TEST_02 0
3 1 TEST_03 (4,5)
;
If you want to determine which processes should run today, you just need to find out what day of the week it is today, and select the records accordingly. Something like:
data General_Stock ;
set CONTROL_FILES_BASE ;
where findc(Nr_week_day,put(weekday(today()),1.)) ;
run ;
When I'm writing this it's Saturday, so weekday(today()) returns 7 and the above selects 0 records, because there are no processes scheduled to run on Saturdays.
If you want a macro, because you want to test to see which processes your control dataset will trigger on different dates, you can write a little macro where you input the extract date. Something like:
%macro GENERATE_STOCK
(data=/*name of input control dataset*/
,out= /*name of output dataset*/
,ExtractDate=/*extract date is a SAS date or expression like today() */
);
data &out ;
set &data ;
where findc(Nr_week_day,put(weekday(&extractDate),1.)) ;
run ;
title1 "Printout of &out genenerated when ExtractDate=%superq(ExtractDate)" ;
proc print data=&out ;
run ;
title1 ;
%mend GENERATE_STOCK ;
Test like:
%generate_stock(data=control_files_base,out=wantToday ,extractdate=today())
%generate_stock(data=control_files_base,out=wantSunday ,extractdate="11Nov2018"d)
%generate_stock(data=control_files_base,out=wantMonday ,extractdate="12Nov2018"d)
%generate_stock(data=control_files_base,out=wantTuesday ,extractdate="13Nov2018"d)
%generate_stock(data=control_files_base,out=wantWednesday,extractdate="14Nov2018"d)
%generate_stock(data=control_files_base,out=wantThursday ,extractdate="15Nov2018"d)
%generate_stock(data=control_files_base,out=wantFriday ,extractdate="16Nov2018"d)
%generate_stock(data=control_files_base,out=wantSaturday ,extractdate="17Nov2018"d)

Related

Converting SAS custom user format to macro program

I have the following code that works properly in my SAS program to subset needed dates to shift hourly data but I need to convert to a macro so that I can call it for multiple data sets. I have very little experience in macro programming so any help would be appreciated.
%let yr_beg=2007;
%let yr_end=2020;
data DST_FMT(drop=year);
attrib hlo length=$1
start dst_start end format=mmddyy10.;
fmtname="dst";
type="N";
do year="&yr_beg" to "&yr_end";
start= nwkdom(2, 1, 3, year)+1;
end= nwkdom(1, 1, 11, year);
dst_start= start - 1;
label='*';
output;
end;
start=.;end=.;
hlo="O";
label='';
output;
run;
proc format cntlin=DST_FMT; run;
Here's how you can convert your current program to a macro for calling.
This is not needed though, usually a %INCLUDE would be more appropriate for something like this since it's usually only done once.
You also do not need the quotes around the macro variable.
%macro generate_date_format(yr_beg= , yr_end);
data DST_FMT(drop=year);
attrib hlo length=$1
start dst_start end format=mmddyy10.;
fmtname="dst";
type="N";
do year=&yr_beg to &yr_end;
start= nwkdom(2, 1, 3, year)+1;
end= nwkdom(1, 1, 11, year);
dst_start= start - 1;
label='*';
output;
end;
start=.;end=.;
hlo="O";
label='';
output;
run;
proc format cntlin=DST_FMT; run;
%mend;
Execute macro:
%generate_date_format(yr_beg = 2008, yr_end = 2021);
Macro tutorial references:
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Sample macros from documentation:
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Appendix/ta-p/291716
Macro documentation:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/mcrolref/titlepage.htm

Find three most recent data year for each row

I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;

How to scan a numeric variable

I have a table like this:
Lista_ID 1 4 7 10 ...
in total there are 100 numbers.
I want to call each one of these numbers to a macro i created. I was trying to use 'scan' but read that it's just for character variables.
the error when i runned the following code was
there's the code:
proc sql;
select ID INTO: LISTA_ID SEPARATED BY '*' from
WORK.AMOSTRA;
run;
PROC SQL;
SELECT COUNT(*) INTO: NR SEPARATED BY '*' FROM
WORK.AMOSTRA;
RUN;
%MACRO CICLO_teste();
%LET LIM_MSISDN = %EVAL(NR);
%LET I = %EVAL(1);
%DO %WHILE (&I<= &LIM_MSISDN);
%LET REF = %SCAN(LISTA_ID,&I,,'*');
DATA WORK.UP&REF;
SET WORK.BASE&REF;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%LET I = %EVAL(&I+1);
%END;
%MEND;
%CICLO_TESTE;
the error was that:
VARIABLE PERC IS UNITIALIZED and
VARIABLE FIRST.ID_CLIENTE IS UNITIALIZED.
What I want is to run this macro for each one of the Id's in the List I showed before, and that are referenced in work.base&ref and work.up&ref.
How can I do it? What I'm doing wrong?
thanks!
Here's the CALL EXECUTE version.
%MACRO CICLO_teste(REF);
DATA WORK.UP&REF;
SET WORK.BASE&REF;
BY ID_CLIENTE;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%CICLO_TESTE;
DATA _NULL_;
SET amostra;
*CREATE YOUR MACRO CALL;
STR = CATT('%CLIO_TESTE(', ID, ')');
CALL EXECUTE(STR);
RUN;
First you should note that SAS macro variable resolve is intrinsically a "text-based" copy-paste action. That is, all the user-defined macro variables are texts. Therefore, %eval is unnecessary in this case.
Other miscellaneous corrections include:
Check the %scan() function for correct usage. The first argument should be a text string WITHOUT QUOTES.
run is redundant in proc sql since each sql statement is run as soon as they are sent. Use quit; to exit proc sql.
A semicolon is not required for macro call (causes unexpected problems sometimes).
use %do %to for loops
The code below should work.
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
proc sql noprint;
select id into :lista_id separated by ' ' from work.amostra;
select count(*) into :nr separated by ' ' from work.amostra;
quit;
* check;
%put lista_id=&lista_id nr=&nr;
%macro ciclo_teste();
%local ref;
%do i = 1 %to &nr;
%let ref = %scan(&lista_id, &i);
%*check;
%put ref = &ref;
/* your task below */
/* data work.up&ref;*/
/* set work.base&ref;*/
/* format perc_acum 9.3;*/
/* if first.id_cliente then perc_acum=0;*/
/* perc_acum + perc;*/
/* run; */
%end;
%mend;
%ciclo_teste()
tested on SAS 9.4 win7 x64
Edited:
In fact I would recommend doing this to avoid scanning a long string which is inefficient.
%macro tester();
/* get the number of obs (a more efficient way) */
%local NN;
proc sql noprint;
select nobs into :NN
from dictionary.tables
where upcase(libname) = 'WORK'
and upcase(memname) = 'AMOSTRA';
quit;
/* assign &ref by random access */
%do i = 1 %to &NN;
data _null_;
a = &i;
set work.amostra point=a;
call symputx('ref',id,'L');
stop;
run;
%*check;
%put ref = &ref;
/* your task below */
%end;
%mend;
%tester()
Please let me know if you have further questions.
Wow that seems like a lot of work. Why not just do the following:
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
%macro test001;
proc sql noprint;
select count(*) into: cnt
from amostra;
quit;
%let cnt = &cnt;
proc sql noprint;
select id into: x1 - :x&cnt
from amostra;
quit;
%do i = 1 %to &cnt;
%let x&i = &&x&i;
%put &&x&i;
%end;
%mend test001;
%test001;
now in variables &x1 - &&x&cnt you have your values and you can process them however you like.
In general if your list is small enough (macro variables are limited to 64K characters) then you are better off passing the list in a single delimited macro variable instead of multiple macro variables.Remember that PROC SQL will automatically set the count into the macro variable SQLOBS so there is no need to run the query twice. Or you can use %sysfunc(countw()) to count the number of entries in your delimited list.
proc sql noprint ;
select id into :idlist separated by '|' from .... ;
%let nr=&sqlobs;
quit;
...
%do i=1 %to &nr ;
%let id=%scan(&idlist,&i,|);
data up&id ;
...
%end;
If you do generate multiple macro variables there is no need to set the upper bound in advance as SAS will only create the number of macro variables it needs based on the number of observations returned by the query.
select id into :idval1 - from ... ;
%let nr=&sqlobs;
If you are using an older version of SAS the you need set an upper bound on the macro variable range.
select id into :idval1 - :idval99999 from ... ;

Unable to match macro variable with dataset variable

The character variable in dataset never matches with the macro variable. The %IF loop never comes true. Kindly advice.
I am trying to match by months and accordingly trying to create array and put counts only for specific months. Not working because the month macro variable never matches with dataset variable having month.
/*create dummy data*/
data datefile;
input tran_date date9. cnt 3.;
datalines;
13feb2015 5
10feb2015 4
11feb2015 3
05feb2015 8
08feb2015 5
01jan2015 1
20dec2014 1
31jan2015 2
23dec2014 2
12jan2015 1
;
/*calculate month*/
data datefile11;
set datefile;
tran_mon=year(tran_date)*100+month(tran_date);
run;
/*select distinct month*/
proc sql;
create table datefile12 as select distinct(tran_mon)
from datefile11 order by tran_mon;
quit;
/*convert month from numeric to character*/
data datefile11(drop=tran_mon);
informat tran_mon2 $6.;
set datefile11;
tran_mon2=tran_mon;
run;
/*create macro variables through datastep*/
data datefile13;
set datefile12;
monum = cat('mnth',_N_);
run;
data _null_;
set datefile13;
call symput(monum,trim(left(tran_mon)));
run;
/*use array to make separate column for each month and
put split count for each month to each colunms*/
%macro c;
proc sql noprint;
select count(1) into :nrow from datefile13;
quit;
%let nrow = &nrow;
data datefile14;
set datefile11;
array mon{*} mon_1 - mon_&nrow;
%do i=1 %to &nrow;
%if tran_mon2 = &&mnth&i %then %do; %put tran_mon2;
mon_&i = cnt; %end;
%else %do; mon_&i = 0 ; %end;
%end;
run;
%mend c;
%c
Your macro %if %then %do check executes while the data step is still being compiled - by the time the data step has begun to execute, there is no further opportunity to use macro logic like that.
Try doing it the other way round - write your loop using if then do data step logic instead.

Create SAS data set conditionally on other data sets

I have 6 identical SAS data sets. They only differ in terms of the values of the observations.
How can I create one output data, which finds the maximum value across all the 6 data sets for each cell?
The update statement seems a good candidate, but it cannot set a condition.
data1
v1 v2 v3
1 1 1
1 2 3
data2
v1 v2 v3
1 2 3
1 1 1
Result
v1 v2 v3
1 2 3
1 2 3
If need be the following could be automated by "PUT" statements or variable arrays.
***ASSUMES DATA SETS ARE SORTED BY ID;
Data test;
do until(last.id);
set a b c;
by id;
if v1 > updv1 then updv1 = v1;
if v2 > updv2 then updv2 = v2;
if v3 > updv3 then updv3 = v3;
end;
drop v1-v3;
rename updv1-updv3 = v1-v3;
run;
To provide a more complete solution to Rico's question(assuming 6 datasets e.g. d1-d6) one could do it this way:
Data test;
array v(*) v1-v3;
array updv(*) updv1-updv3;
do until(last.id);
set d1-d6;
by id;
do i = 1 to dim(v);
if v(i) > updv(i) then updv(i) = v(i);
end;
end;
drop v1-v3;
rename updv1-updv3 = v1-v3;
run;
proc print;
var id v1-v3;
run;
See below. For a SAS beginner might be too complex. I hope the comments do explain it a bit.
/* macro rename_cols_opt to generate cols_opt&n variables
- cols_opt&n contains generated code for dataset RENAME option for a given (&n) dataset
*/
%macro rename_cols_opt(n);
%global cols_opt&n max&n;
proc sql noprint;
select catt(name, '=', name, "&n") into: cols_opt&n separated by ' '
from dictionary.columns
where libname='WORK' and memname='DATA1'
and upcase(name) ne 'MY_ID_COLUMN'
;
quit;
%mend;
/* prepare macro variables = pre-generate the code */
%rename_cols_opt(1)
%rename_cols_opt(2)
%rename_cols_opt(3)
%rename_cols_opt(4)
%rename_cols_opt(5)
%rename_cols_opt(6)
/* create macro variable keep_list containing names of output variables to keep (based on DATA1 structure, the code expects those variables in other tables as well */
proc sql noprint;
select trim(name) into: keep_list separated by ' '
from dictionary.columns
where libname='WORK' and memname='DATA1'
;
quit;
%put &keep_list;
/* macro variable maxcode contains generated code for calculating all MAX values */
proc sql noprint;
select cat(trim(name), ' = max(of ', trim(name), ":)") into: maxcode separated by '; '
from dictionary.columns
where libname='WORK' and memname='DATA1'
and upcase(name) ne 'MY_ID_COLUMN'
;
quit;
%put "&maxcode";
data result1 / view =result1;
merge
data1 (in=a rename=(&cols_opt1))
data2 (in=b rename=(&cols_opt2))
data3 (in=b rename=(&cols_opt3))
data4 (in=b rename=(&cols_opt4))
data5 (in=b rename=(&cols_opt5))
data6 (in=b rename=(&cols_opt6))
;
by MY_ID_COLUMN;
&maxcode;
keep &keep_list;
run;
/* created a datastep view, now "describing" it to see the generated code */
data view=result1;
describe;
run;
Here's another attempt that is scalable against any number of datasets and variables. I've added in an ID variable this time as well. Like the answer from #vasja, there are some advanced techniques used here. The 2 solutions are in fact very similar, I've used 'call execute' instead of a macro to create the view. My solution also requires the dataset names to be stored in a dataset.
/* create dataset of required dataset names */
data datasets;
input ds_name $;
cards;
data1
data2
;
run;
/* dummy data */
data data1;
input id v1 v2 v3;
cards;
10 1 1 1
20 1 2 3
;
run;
data data2;
input id v1 v2 v3;
cards;
10 1 2 3
20 1 1 1
;
run;
/* create dataset, macro list and count of variables names */
proc sql noprint;
create table variables as
select name as v_name from dictionary.columns
where libname='WORK' and upcase(memname)='DATA1' and upcase(name) ne 'ID';
select name, count(*) into :keepvar separated by ' ',
:numvar
from dictionary.columns
where libname='WORK' and upcase(memname)='DATA1' and upcase(name) ne 'ID';
quit;
/* create view that joins all datasets, renames variables and calculates maximum value per id */
data _null_;
set datasets end=last;
if _n_=1 then call execute('data data_all / view=data_all; merge');
call execute (trim(ds_name)|| '(rename=(');
do i=1 to &numvar.;
set variables point=i;
call execute(trim(v_name)||'='||catx('_',v_name,_n_));
end;
call execute('))');
if last then do;
call execute('; by id;');
do i=1 to &numvar.;
set variables point=i;
call execute(trim(v_name)||'='||'max(of '||trim(v_name)||':);');
end;
call execute('run;');
end;
run;
/* create dataset of maximum values per id per variable */
data result (keep=id &keepvar.);
set data_all;
run;