sas dynamic call symput with unknown number of fields in the dataset - sas

I have following dataset
data parm2;
input a b c d e;
datalines;
1 2 3 4 A
;
run;
Problem1: I would like have a set of macro variables. Assume i do not know the number of fields and its corresponding name of the field.
Problem2: fields are not same datatype.
desired operation is like following:
data _null_;
set parm2;
call symput('a',a);
call symput('b',b);
call symput('c',c);
call symput('d',d);
call symput('e',e);
run;
%put &a;

If this is the structure of your data, I would transpose:
proc transpose data=parm2 out=parmt;
var _all_;
run;
Then reference the two columns to create all the macro variables and their corresponding values:
data _null_;
set parmt;
call symput(_name_,col1);
run;

after some research i found the following solution. Although not a perfect one but worth to share. Looking forward #Reeze answer
data _null_;
set parm2;
array t(*) _numeric_; /*this deal with different data type*/
do i = 1 to dim(t);
call symput(vname(t[i]), t[i]);
end;
array t2(*) _character_;
do i = 1 to dim(t2);
call symput(vname(t2[i]), t2[i]);
end;
run;

Here's a Call VNEXT solution with VVALUEX, assuming you don't have a variable that has the same name as an automatic variable it seems to work. Derived solution from SAS Note: http://support.sas.com/kb/24/798.html
data parm2;
input a b c d e $;
datalines;
1 2 3 4 A
;
run;
data _null_;
set parm2;
length name $32;
*temporarily set name to not missing to start loop;
name='blank';
do while(name ne " ");
call vnext(name);
/* Omit automatic variables, and variables created in this step only */
if trim(name) not in('list','name','flag','i',' ','_ERROR_','_N_') then
call symput(name, vvaluex(name));
end;
run;
%put &a;
%put &b;
%put &c;
%put &d;
%put &e;

[ Edited - some codes or lines of code are marked with * as the OP does not require it ]
Use proc sql dictionary to get the variable name contained in your datase with the use of Memname and libname specification.
Use data step to obtained variables into marco variable. The name of variables are stored under the column name called name, and that's why we have to put it as call symputx( 'variable ' !! left(_n_), **name** );. The function of macro variable Total is to tell the number of variables existed in your data set.
Now you would have variable1= a , variable2= b....
%macro definevar ( library, dataset);
proc sql;
create table Attribute as
select * from dictionary.columns;
where memname = upcase( &dataset ) and libname = upcase(&library);
quit;
data letmacro;
set Attribute end=end;
call symputx( 'variable ' !! left(_n_), name );
* if end then call symputx ( Total, _n_);
run;
/*
***** extra ********
data _null_;
set &dataset ;
%do i=1 to &total;
call symputx ( "var&i" !! left(_n_), &&variable&i );
%end;
run;
***** extra ********
*/
%mend definevar;
%definevar( ifanylibrary, parm2)
And I am looking forward to learn CALL VNEXT solution by #Reeza

Related

How to print two sets of variable series next to each other in SAS?

I have a SAS dataset where I keep 50 diagnoses codes and 50 diagnoses descriptions.
It looks something like this:
data diags;
set diag_list;
keep claim_id diagcode1-diagcode50 diagdesc1-diagdesc50;
run;
I need to print all of the variables but I need diagnosis description right next to corresponding diagnosis code. Something like this:
proc print data=diags;
var claim_id diagcode1 diagdesc1 diagcode2 diagdesc2 diagcode3 diagdesc3; *(and so on all the way to 50);
run;
Is there a way to do this (possibly using arrays) without having to type it all up?
Here's one approach then, using Macros. If you have other variables make sure to include them BEFORE the %loop_names(n=50) portion in the VAR statement.
*generate fake data to test/run solution;
data demo;
array diag(50);
array diagdesc(50);
do claim_id=1 to 100;
do i=1 to 50;
diag(i)=rand('normal');
diagdesc(i)=rand('uniform');
end;
output;
end;
run;
%macro loop_names(n=);
%do i=1 %to &n;
diag&i diagdesc&i.
%end;
%mend;
proc print data=demo;
var claim_ID %loop_names(n=20);
run;
Here is some example SAS code that uses actual ICD 10 CM codes and their descriptions and #Reeza proc print:
%* Copy government provided Medicare code data zip file to local computer;
filename cms_cm url 'https://www.cms.gov/Medicare/Coding/ICD10/Downloads/2020-ICD-10-CM-Codes.zip' recfm=s;
filename zip_cm "%sysfunc(pathname(work))/2020-ICD-10-CM-Codes.zip" lrecl=200000000 recfm=n ;
%let rc = %sysfunc(fcopy(cms_cm, zip_cm));
%put %sysfunc(sysmsg());
%* Define fileref to the zip file member that contains ICD 10 CM codes and descriptions;
filename cm_codes zip "%sysfunc(pathname(zip_cm))" member="2020 Code Descriptions/icd10cm_codes_2020.txt";
%* input the codes and descriptions, there are 72,184 of them;
%* I cheated and looked at the data (more than once) in order
%* to determine the variable sizes needed;
data icd10cm_2020;
infile cm_codes lrecl=250 truncover;
attrib
code length=$7
desc length=$230
;
input
code 1-7 desc 9-230;
;
run;
* simulate claims sample data with mostly upto 8 diagnoses, and
* at least one claim with 50 diagnoses;
data have;
call streaminit(123);
do claim_id = 1 to 10;
array codes(50) $7 code1-code50;
array descs(50) $230 desc1-desc50;
call missing(of code:, of desc:);
if mod(claim_id, 10) = 0
then top = 50;
else top = rand('uniform', 8);
do _n_ = 1 to top;
p = ceil(rand('uniform', n)); %* pick a random diagnosis code, 1 of 72,184;
set icd10cm_2020 nobs=n point=p; %* read the data for that random code;
codes(_n_) = code;
descs(_n_) = desc;
end;
output;
end;
stop;
drop top;
run;
%macro loop_names(n=);
%do i=1 %to &n;
code&i desc&i.
%end;
%mend;
ods _all_ close;
ods html;
proc print data=have;
var claim_id %loop_names(n=50);
run;

load values from datasets into arrays and use them in a datastep

I have 5 separate datasets(actually many more but i want to shorten the code) named dk33,dk34,dk35,dk51,dk63, each dataset contains a numeric field: surv_probs. I would like to load the values into 5 arrays and then use the arrays in a datastep(result), however, I need advice what is the best way to do it.
I am getting error when I use the macro: setarrays: (code below)
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
ERROR: Illegal reference to the array dk33_arr.
Here is the main code.
%let var1 = dk33;
%let var2 = dk34;
%let var3 = dk35;
%let var4 = dk51;
%let var5 = dk63;
%let varN = 5;
/*put length of each column into macro variables */
%macro getlength;
%do i=1 %to &varN;
proc sql noprint;
select count(surv_probs)
into : &&var&i.._rows
from work.&&var&i;
quit;
%end;
%mend;
/*load values of column:surv_probs into macro variables*/
%macro readin;
%do i=1 %to &varN;
proc sql noprint;
select surv_probs
into: &&var&i.._list separated by ","
from &&var&i;
quit;
%end;
%mend;
data _null_;
call execute('%readin');
call execute('%getlength');
run;
/* create arrays*/
%macro setarrays;
%do i=1 %to 1;
j=1;
array &&var&i.._arr{&&&&&&var&i.._rows};
do while(scan("&&&&&&var&i.._list",j,",") ne "");
&&var&i.._arr = scan("&&&&&&var&i.._list",j,",");
j=j+1;
end;
%end;
%mend;
data result;
%setarrays
put dk33_arr(1);
* some other statements where I use the arrays*
run;
Answer to toms question:
*macro getlength(when executed) creates 5 macro variables named: dk33_rows,dk34_rows,dk35_rows,dk51_rows,dk63_rows
*the macro readin(when executed):creates 5 macro variables dk33_list,dk34_list,dk35_list,dk51_list,dk63_list. Each containing a string which is comma separates the values from the column: eg.: 0.99994,0.1999,0.1111
*the macro setarrays creates 5 arrays,when executed, dk33_arr,dk34_arr,... holding the parsed values from the macro variables created by readin
I find that "macro arrays" like VAR1,VAR2,.... are generally more trouble than they are worth. Either keep your list of dataset names in an actual dataset and generate code from that. Or if the list is short enough put the list into a single macro variable and use %SCAN() to pull out the items as you need them.
But either way it is also better to avoid trying to write macro code that needs more than three &'s. Build up the reference in multiple steps. Build a macro variable that has the name of the macro you want to reference and then pull the value of that into another macro variable. It might take more lines of code, but you can more easily understand what is happening.
%let i=1 ;
%let mvarname=var&i;
%let dataset_name=&&&mvarname;
Before you begin using macro code (or other code generation techniques) make sure you know what code you are trying to generate. If you want to load a variable into a temporary array you can just use a DO loop. There is no need to macro code, or copying values, or even counts, into macro variables. For example instead of getting the count of the observations you could just make your temporary array larger than you expect to ever need.
data test1 ;
if _n_=1 then do;
do i=1 to nobs_dk33;
array dk33 (1000) _temporary_;
set dk33 nobs=nobs_dk33 ;
dk33(i)=surv_probs;
end;
do i=1 to nobs_dk34;
array dk34 (1000) _temporary_;
set dk34 nobs=nobs_dk34 ;
dk34(i)=surv_probs;
end;
end;
* What ever you are planning to do with the DK33 and DK34 arrays ;
run;
Or you could transpose the dataset first.
proc transpose data=dk33 out=dk33_t prefix=dk33_ ;
var surv_probs ;
run;
Then your later step is easier since you can just use a SET statement to read in the one observation that has all of the values.
data test;
if _n_=1 then do;
set dk33_t ;
array dk33 dk33_: ;
end;
....
run;

Do loop for creating new variables in SAS

I am trying to run this code
data swati;
input facility_id$ loan_desc : $50. sys_name :$50.;
cards;
fac_001 term_loan RM_platform
fac_001 business_loan IQ_platform
fac_002 business_loan BUSES_termloan
fac_002 business_loan RM_platform
fac_003 overdrafts RM_platform
fac_003 RCF IQ_platform
fac_003 term_loan BUSES_termloan
;
proc contents data=swati out=contents(keep=name varnum);
run;
proc sort data=contents;
by varnum;
run;
data contents;
set contents ;
where varnum in (2,3);
run;
data contents;
set contents;
summary=catx('_',name, 'summ');
run;
data _null_;
set contents;
call symput ("name" || put(_n_ , 10. -L), name);
call symput ("summ" || put (_n_ , 10. -L), summary);
run;
options mlogic symbolgen mprint;
%macro swati;
%do i = 1 %to 2;
proc sort data=swati;
by facility_id &&name&i.;
run;
data swati1;
set swati;
by facility_id &&name&i.;
length &&summ&i. $50.;
retain &&summ&i.;
if first.facility_id then do;
&&summ&i.="";
end;
if first.&&name&i. = last.&&name&i. then &&summ&i.=catx(',',&&name&i., &&summ&i.);
else if first.&&name&i. ne last.&&name&i. then &&summ&i.=&&name&i.;
run;
if last.facility_id ;
%end;
%mend;
%swati;
This code will create two new variables loan_desc_summ and sys_name_summ which has values of the all the loans_desc in one line and the sys_names in one line seprated by comma example (term_loan, business_loan), (RM_platform, IQ_platform) But if a customer has only one loan_desc the loan_summ should only have its value twice.
The problem while running the do loop is that after running this code, I am getting the dataset with only the sys_name_summ and not the loan_desc_summ. I want the dataset with all the five variables facility_id, loan_desc, sys_name, loan_desc_summ, sys_name_summ.
Could you please help me in finding out if there is a problem in the do loop??
Your loop is always starting with the same input dataset (swati) and generating a new dataset (SWATI1). So only the last time through the loop has any effect. Each loop would need to start with the output of the previous run.
You also need to fix your logic for eliminating the duplicates.
For example you could change the macro to:
%macro swati;
data swati1;
set swati;
run;
%do i = 1 %to 2;
proc sort data=swati1;
by facility_id &&name&i.;
run;
data swati1;
set swati1;
by facility_id &&name&i ;
length &&summ&i $500 ;
if first.facility_id then &&summ&i = ' ' ;
if first.&&name&i then catx(',',&&summ&i,&&name&i);
if last.facility_id ;
run;
%end;
%mend;
Also your program could be a lot smaller if you just used arrays.
data want ;
set have ;
by facility_id ;
array one loan_desc sys_name ;
array two $500 loan_desc_summ sys_name_summ ;
retain loan_desc_summ sys_name_summ ;
do i=1 to dim(one);
if first.facility_id then two(i)=one(i) ;
else if not findw(two(i),one(i),',','t') then two(i)=catx(',',two(i),one(i));
end;
if last.facility_id;
drop i loan_desc sys_name ;
run;
If you want to make it more flexible you can put the list of variable names into a macro variable.
%let varlist=loan_desc sys_name;
You could then generate the list of new names easily.
%let varlist2=%sysfunc(tranwrd(&varlist,%str( ),_summ%str( )))_summ ;
Then you can use the macro variables in the ARRAY, RETAIN and DROP statements.

How to scan a numeric variable

I have a table like this:
Lista_ID 1 4 7 10 ...
in total there are 100 numbers.
I want to call each one of these numbers to a macro i created. I was trying to use 'scan' but read that it's just for character variables.
the error when i runned the following code was
there's the code:
proc sql;
select ID INTO: LISTA_ID SEPARATED BY '*' from
WORK.AMOSTRA;
run;
PROC SQL;
SELECT COUNT(*) INTO: NR SEPARATED BY '*' FROM
WORK.AMOSTRA;
RUN;
%MACRO CICLO_teste();
%LET LIM_MSISDN = %EVAL(NR);
%LET I = %EVAL(1);
%DO %WHILE (&I<= &LIM_MSISDN);
%LET REF = %SCAN(LISTA_ID,&I,,'*');
DATA WORK.UP&REF;
SET WORK.BASE&REF;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%LET I = %EVAL(&I+1);
%END;
%MEND;
%CICLO_TESTE;
the error was that:
VARIABLE PERC IS UNITIALIZED and
VARIABLE FIRST.ID_CLIENTE IS UNITIALIZED.
What I want is to run this macro for each one of the Id's in the List I showed before, and that are referenced in work.base&ref and work.up&ref.
How can I do it? What I'm doing wrong?
thanks!
Here's the CALL EXECUTE version.
%MACRO CICLO_teste(REF);
DATA WORK.UP&REF;
SET WORK.BASE&REF;
BY ID_CLIENTE;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%CICLO_TESTE;
DATA _NULL_;
SET amostra;
*CREATE YOUR MACRO CALL;
STR = CATT('%CLIO_TESTE(', ID, ')');
CALL EXECUTE(STR);
RUN;
First you should note that SAS macro variable resolve is intrinsically a "text-based" copy-paste action. That is, all the user-defined macro variables are texts. Therefore, %eval is unnecessary in this case.
Other miscellaneous corrections include:
Check the %scan() function for correct usage. The first argument should be a text string WITHOUT QUOTES.
run is redundant in proc sql since each sql statement is run as soon as they are sent. Use quit; to exit proc sql.
A semicolon is not required for macro call (causes unexpected problems sometimes).
use %do %to for loops
The code below should work.
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
proc sql noprint;
select id into :lista_id separated by ' ' from work.amostra;
select count(*) into :nr separated by ' ' from work.amostra;
quit;
* check;
%put lista_id=&lista_id nr=&nr;
%macro ciclo_teste();
%local ref;
%do i = 1 %to &nr;
%let ref = %scan(&lista_id, &i);
%*check;
%put ref = &ref;
/* your task below */
/* data work.up&ref;*/
/* set work.base&ref;*/
/* format perc_acum 9.3;*/
/* if first.id_cliente then perc_acum=0;*/
/* perc_acum + perc;*/
/* run; */
%end;
%mend;
%ciclo_teste()
tested on SAS 9.4 win7 x64
Edited:
In fact I would recommend doing this to avoid scanning a long string which is inefficient.
%macro tester();
/* get the number of obs (a more efficient way) */
%local NN;
proc sql noprint;
select nobs into :NN
from dictionary.tables
where upcase(libname) = 'WORK'
and upcase(memname) = 'AMOSTRA';
quit;
/* assign &ref by random access */
%do i = 1 %to &NN;
data _null_;
a = &i;
set work.amostra point=a;
call symputx('ref',id,'L');
stop;
run;
%*check;
%put ref = &ref;
/* your task below */
%end;
%mend;
%tester()
Please let me know if you have further questions.
Wow that seems like a lot of work. Why not just do the following:
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
%macro test001;
proc sql noprint;
select count(*) into: cnt
from amostra;
quit;
%let cnt = &cnt;
proc sql noprint;
select id into: x1 - :x&cnt
from amostra;
quit;
%do i = 1 %to &cnt;
%let x&i = &&x&i;
%put &&x&i;
%end;
%mend test001;
%test001;
now in variables &x1 - &&x&cnt you have your values and you can process them however you like.
In general if your list is small enough (macro variables are limited to 64K characters) then you are better off passing the list in a single delimited macro variable instead of multiple macro variables.Remember that PROC SQL will automatically set the count into the macro variable SQLOBS so there is no need to run the query twice. Or you can use %sysfunc(countw()) to count the number of entries in your delimited list.
proc sql noprint ;
select id into :idlist separated by '|' from .... ;
%let nr=&sqlobs;
quit;
...
%do i=1 %to &nr ;
%let id=%scan(&idlist,&i,|);
data up&id ;
...
%end;
If you do generate multiple macro variables there is no need to set the upper bound in advance as SAS will only create the number of macro variables it needs based on the number of observations returned by the query.
select id into :idval1 - from ... ;
%let nr=&sqlobs;
If you are using an older version of SAS the you need set an upper bound on the macro variable range.
select id into :idval1 - :idval99999 from ... ;

Macro returning a value

I created the following macro. Proc power returns table pw_cout containing column Power. The data _null_ step assigns the value in column Power of pw_out to macro variable tpw. I want the macro to return the value of tpw, so that in the main program, I can call it in DATA step like:
data test;
set tmp;
pw_tmp=ttest_power(meanA=a, stdA=s1, nA=n1, meanB=a2, stdB=s2, nB=n2);
run;
Here is the code of the macro:
%macro ttest_power(meanA=, stdA=, nA=, meanB=, stdB=, nB=);
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw'=&power);
run;
&tpw
%mend ttest_power;
#itzy is correct in pointing out why your approach won't work. But there is a solution maintaing the spirit of your approach: you need to create a power-calculation function uisng PROC FCMP. In fact, AFAIK, to call a procedure from within a function in PROC FCMP, you need to wrap the call in a macro, so you are almost there.
Here is your macro - slightly modified (mostly to fix the symput statement):
%macro ttest_power;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw', power);
run;
%mend ttest_power;
Now we create a function that will call it:
proc fcmp outlib=work.funcs.test;
function ttest_power_fun(meanA, stdA, nA, meanB, stdB, nB);
rc = run_macro('ttest_power', meanA, stdA, nA, meanB, stdB, nB, tpw);
if rc = 0 then return(tpw);
else return(.);
endsub;
run;
And finally, we can try using this function in a data step:
options cmplib=work.funcs;
data test;
input a s1 n1 a2 s2 n2;
pw_tmp=ttest_power_fun(a, s1, n1, a2, s2, n2);
cards;
0 1 10 0 1 10
0 1 10 1 1 10
;
run;
proc print data=test;
You can't do what you're trying to do this way. Macros in SAS are a little different than in a typical programming language: they aren't subroutines that you can call, but rather just code that generate other SAS code that gets executed. Since you can't run proc power inside of a data step, you can't run this macro from a data step either. (Just imagine copying all the code inside the macro into the data step -- it wouldn't work. That's what a macro in SAS does.)
One way to do what you want would be to read each observation from tmp one at a time, and then run proc power. I would do something like this:
/* First count the observations */
data _null_;
call symputx('nobs',obs);
stop;
set tmp nobs=obs;
run;
/* Now read them one at a time in a macro and call proc power */
%macro power;
%do j=1 %to &nobs;
data _null_;
nrec = &j;
set tmp point=nrec;
call symputx('meanA',meanA);
call symputx('stdA',stdA);
call symputx('nA',nA);
call symputx('meanB',meanB);
call symputx('stdB',stdB);
call symputx('nB',nB);
stop;
run;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
proc append base=pw_out_all data=pw_out; run;
%end;
%mend;
%power;
By using proc append you can store the results of each round of output.
I haven't checked this code so it might have a bug, but this approach will work.
You can invoke a macro which calls procedures, etc. (like the example) from within a datastep using call execute(), but it can get a bit messy and difficult to debug.