Create dummy using category variables more efficiently in SAS - sas

I have data on with state variation in U.S. Now i want to creat many dummies to control state fix effect. In stata it's an easy work while in sas it seems I have to create all dummies manually.However logit regression with fix effects runs quite slow in stata. I wonder whether there's a more efficient way to create dummy from char variables(not numerical, which I know a few methods to apply) in sas since I have too many char variables need to be created as dummies.
Cheers,
Eva

proc logistic supports the class statement. Place your variables in the class statement and you can specify the type of parameterization you'd like as well. The most common method is referential coding.
proc logistic data=sashelp.heart;
class sex bp_status/param=ref;
model status = sex ageAtStart height weight bp_status;
run;
https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect006.htm
Not all procs support the class statement, in those cases you can use proc glmmod or a variety of other method to create your dummy variables.
http://blogs.sas.com/content/iml/2016/02/22/create-dummy-variables-in-sas.html

If you absolutely need to manually create dummy variables you can use a macro like this one. You would need to call it for each variable.
%macro create_dummy(dataset=, var=);
%* Save Distinct Values and Dummy Variable Names;
proc sql noprint;
select distinct
&var,
tranwrd(tranwrd(trim(&var), " ", "_"), ".", "")
into
:value1-,
:name1-
from
&dataset
;
select
count(distinct(&var))
into
:total
from
&dataset
;
quit;
%* Create Dummy Variables;
data &dataset;
set &dataset;
%do i=1 %to &total;
if &var = "&&value&i" then &&name&i = 1; else &&name&i = 0;
%end;
run;
%mend create_dummy;
You can add a loop to the Macro if you want to call the Macro only once. Add a do loop to the top like:
%macro create_dummy(dataset=, var=);
%do l %to %sysfunc(countw(&var));
%let var1 = %scan(&var, &l);
%* Save Distinct Values and Dummy Variable Names;
proc sql noprint;
select distinct
&var1,
tranwrd(tranwrd(trim(&var1), " ", "_"), ".", "")
into
:value1-,
:name1-
from
&dataset
;
select
count(distinct(&var1))
into
:total
from
&dataset
;
quit;
%* Create Dummy Variables;
data &dataset;
set &dataset;
%do i=1 %to &total;
if &var1 = "&&value&i" then &&name&i = 1; else &&name&i = 0;
%end;
run;
%end;
%mend create_dummy;

Related

PROC SQL within SAS Macro to list all variables of a data set - SELECT Statement causing error

I was trying to create a macro to output a list of all variables of a specific data set. In my macro, I am using PROC SQL. The code runs OK outside %macro, but error message saying the SELECT statement is not valid when it is being used within %MACRO
here is an example:
proc sql noprint;
select name into :vlist separated by ' '
from dictionary.columns
where memname = upcase("&dsn");
quit;
%put &vlist;
the above works perfectly;
but
%macro getvars(dsn);
%local vlist;
proc sql noprint;
select name into :vlist separated by ' '
from dictionary.columns
where memname = upcase("&dsn");
quit;
&vlist;
%mend;
the above doesn't work when I tried to do:
%let var_list = %getvars(dataset);
it returns:
ERROR 180-322: Statement is not valid or it is used out of proper order.
underlining the SELECT statement within the PROC SQL
SAS macros are not like functions in most programming languages: they don't return values, they are actually replaced by the content of the macro.
The solution is to make your macro variable global, outside the macro. Then you don't need to assign it to a new macro variable with %let.
%global vlist;
%macro getvars(dsn);
proc sql noprint;
select name into :vlist separated by ' '
from dictionary.columns
where memname = upcase("&dsn");
quit;
%mend;
%getvars(work.class)
%put &=vlist;
[EDIT]
and then just use the list in your keep statement
data OUT (keep= &vlist. VAR_B1);
merge DATA_A (in=a) DATA_B (in=b) ;
run;
Seems like the only viable option for my use case is from the following SAS paper, under the section of "USING A MACRO LOOP"
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/028-30.pdf
To clarify, my use case need a direct output of the list itself, not a macro variable.
e.g.
data OUT (keep= %getvars(DATA_A) VAR_B1);
merge DATA_A (in=a)
DATA_B (in=b)
;
run;
The PROC SQL won't work for me. So I think I need to move over to SAS I/O Functions in Macro Loop.
Below is from the SAS Paper:
%Macro GetVars(Dset) ;
%Local VarList ;
/* open dataset */
%Let FID = %SysFunc(Open(&Dset)) ;
/* If accessable, process contents of dataset */
%If &FID %Then %Do ;
%Do I=1 %To %SysFunc(ATTRN(&FID,NVARS)) ;
%Let VarList= &VarList %SysFunc(VarName(&FID,&I));
%End ;
/* close dataset when complete */
%Let FID = %SysFunc(Close(&FID)) ;
%End ;
&VarList
%Mend ;
A macro using %SYSFUNC(DOSUBL( can run any amount of SAS code (in a separate stream) when invoked at source code parse-time.
Example:
data have_A;
do index = 1 to 10;
x = index ** 2; y = x-1; z = x+1; p = x/2; q = sqrt(x); output;
end;
run;
data have_B(keep=B1);
do index = 1 to 10;
B1 + index; output;
end;
run;
%macro getvars(data);
%local rc lib mem names;
%let rc = %sysfunc(DOSUBL(%nrstr(
%let syslast = &data;
%let lib = %scan (&SYSLAST,1,.);
%let mem = %scan (&SYSLAST,2,.);
proc sql noprint;
select name into :names separated by ' ' from
dictionary.columns where
libname = "&lib." and
memname = "&mem."
;
quit;
)));
/* Emit variable name list */
&names.
%mend;
data OUT (keep=%getvars(HAVE_A) B1);
merge HAVE_A (in=a) /* 1:1 merge (no BY) */
HAVE_B (in=b)
;
run;
%let var_list = %getvars(dataset);
will resolve to:
%let var_list = proc sql noprint;
select name into :vlist separated by ' '
from dictionary.columns
where memname = upcase("dataset");
quit;
So it will store "proc SQL noprint" in var_list, and then fail because you use sql satements outside of proc sql.

How to scan a numeric variable

I have a table like this:
Lista_ID 1 4 7 10 ...
in total there are 100 numbers.
I want to call each one of these numbers to a macro i created. I was trying to use 'scan' but read that it's just for character variables.
the error when i runned the following code was
there's the code:
proc sql;
select ID INTO: LISTA_ID SEPARATED BY '*' from
WORK.AMOSTRA;
run;
PROC SQL;
SELECT COUNT(*) INTO: NR SEPARATED BY '*' FROM
WORK.AMOSTRA;
RUN;
%MACRO CICLO_teste();
%LET LIM_MSISDN = %EVAL(NR);
%LET I = %EVAL(1);
%DO %WHILE (&I<= &LIM_MSISDN);
%LET REF = %SCAN(LISTA_ID,&I,,'*');
DATA WORK.UP&REF;
SET WORK.BASE&REF;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%LET I = %EVAL(&I+1);
%END;
%MEND;
%CICLO_TESTE;
the error was that:
VARIABLE PERC IS UNITIALIZED and
VARIABLE FIRST.ID_CLIENTE IS UNITIALIZED.
What I want is to run this macro for each one of the Id's in the List I showed before, and that are referenced in work.base&ref and work.up&ref.
How can I do it? What I'm doing wrong?
thanks!
Here's the CALL EXECUTE version.
%MACRO CICLO_teste(REF);
DATA WORK.UP&REF;
SET WORK.BASE&REF;
BY ID_CLIENTE;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%CICLO_TESTE;
DATA _NULL_;
SET amostra;
*CREATE YOUR MACRO CALL;
STR = CATT('%CLIO_TESTE(', ID, ')');
CALL EXECUTE(STR);
RUN;
First you should note that SAS macro variable resolve is intrinsically a "text-based" copy-paste action. That is, all the user-defined macro variables are texts. Therefore, %eval is unnecessary in this case.
Other miscellaneous corrections include:
Check the %scan() function for correct usage. The first argument should be a text string WITHOUT QUOTES.
run is redundant in proc sql since each sql statement is run as soon as they are sent. Use quit; to exit proc sql.
A semicolon is not required for macro call (causes unexpected problems sometimes).
use %do %to for loops
The code below should work.
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
proc sql noprint;
select id into :lista_id separated by ' ' from work.amostra;
select count(*) into :nr separated by ' ' from work.amostra;
quit;
* check;
%put lista_id=&lista_id nr=&nr;
%macro ciclo_teste();
%local ref;
%do i = 1 %to &nr;
%let ref = %scan(&lista_id, &i);
%*check;
%put ref = &ref;
/* your task below */
/* data work.up&ref;*/
/* set work.base&ref;*/
/* format perc_acum 9.3;*/
/* if first.id_cliente then perc_acum=0;*/
/* perc_acum + perc;*/
/* run; */
%end;
%mend;
%ciclo_teste()
tested on SAS 9.4 win7 x64
Edited:
In fact I would recommend doing this to avoid scanning a long string which is inefficient.
%macro tester();
/* get the number of obs (a more efficient way) */
%local NN;
proc sql noprint;
select nobs into :NN
from dictionary.tables
where upcase(libname) = 'WORK'
and upcase(memname) = 'AMOSTRA';
quit;
/* assign &ref by random access */
%do i = 1 %to &NN;
data _null_;
a = &i;
set work.amostra point=a;
call symputx('ref',id,'L');
stop;
run;
%*check;
%put ref = &ref;
/* your task below */
%end;
%mend;
%tester()
Please let me know if you have further questions.
Wow that seems like a lot of work. Why not just do the following:
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
%macro test001;
proc sql noprint;
select count(*) into: cnt
from amostra;
quit;
%let cnt = &cnt;
proc sql noprint;
select id into: x1 - :x&cnt
from amostra;
quit;
%do i = 1 %to &cnt;
%let x&i = &&x&i;
%put &&x&i;
%end;
%mend test001;
%test001;
now in variables &x1 - &&x&cnt you have your values and you can process them however you like.
In general if your list is small enough (macro variables are limited to 64K characters) then you are better off passing the list in a single delimited macro variable instead of multiple macro variables.Remember that PROC SQL will automatically set the count into the macro variable SQLOBS so there is no need to run the query twice. Or you can use %sysfunc(countw()) to count the number of entries in your delimited list.
proc sql noprint ;
select id into :idlist separated by '|' from .... ;
%let nr=&sqlobs;
quit;
...
%do i=1 %to &nr ;
%let id=%scan(&idlist,&i,|);
data up&id ;
...
%end;
If you do generate multiple macro variables there is no need to set the upper bound in advance as SAS will only create the number of macro variables it needs based on the number of observations returned by the query.
select id into :idval1 - from ... ;
%let nr=&sqlobs;
If you are using an older version of SAS the you need set an upper bound on the macro variable range.
select id into :idval1 - :idval99999 from ... ;

Create a sequence of new column names

I have a hundred or so columns which I would like to rename in SAS using the following macro:
%macro rename1(oldvarlist, newvarlist);
%let k=1;
%let old = %scan(&oldvarlist, &k);
%let new = %scan(&newvarlist, &k);
%do %while(("&old" NE "") & ("&new" NE ""));
rename &old = &new;
%let k = %eval(&k + 1);
%let old = %scan(&oldvarlist, &k);
%let new = %scan(&newvarlist, &k);
%end;
%mend;
The columns are currently named C5, C7, C9, ..., C205 and I would like to rename them AR_0, AR_1, ..., AR100.
With the macro above, how can I put these new names after the comma of the following code without writing each and every one of them?
%rename1(C5--C205, # new names here #);
This is a bit of a longer solution, but it's fairly dynamic and you easy to see how things work. I'm assuming you'll use the rename statement in proc datasets. Otherwise you could just be lazy and use arrays to replace then drop the old variables, though that isn't efficient.
proc sql;
create table oldvar as
select name, varnum
from sashelp.vcolumn
where upcase(libname)='SASHELP'
and upcase(memname)='CLASS'
order by varnum;
quit;
data rename;
set oldvar;
new_var=catx("_", "AR",varnum);
run;
proc sql noprint;
select catx("=", name, new_var) into :rename_list
separated by " "
from rename;
quit;
%put rename &rename_list;
proc datasets library=work;
modify my_dataset;
rename &rename_list;
run;quit;
This will first find the old columns and rename them to AR_# and create macrovariable varlist that you can use:
proc sql noprint;
create table newvar as
select name
from sashelp.vcolumn
where libname="SASHELP" and memname="CLASS"
order by name;
quit;
data newvar;
set newvar;
name=compress("AR_"!!put(_n_,4.));
run;
proc sql noprint;
select name into :varlist separated by " "
from newvar;
quit;
Probably, something like this would do the job
%macro rename2(oldvarlist, newPrefix);
%let k=1;
%let old = %scan(&oldvarlist, &k);
%do %while(("&old" NE ""));
rename &old = &newPrefix.&k.;
%let k = %eval(&k + 1);
%let old = %scan(&oldvarlist, &k);
%end;
%mend;

sas how to use a list to store count distinct of all the variables in a table

I want to store the count distinct of each variable from a table in another. I wanted to use a loop for it, over the list of the variables. So first, I stored the variables names in "vars", doing this:
proc sql ;
select name
into :vars separated by ' '
from dictionary.columns
where libname eq 'HW' and
memname eq "ORDERS";
quit;
Then, I created another list with the result of the count distinct with the following code:
%macro g();
%let b=;
%do i = 1 %to 3;
%let a=%scan(&vars,&i);
proc sql;
select count(distinct &a)
into :gaby from hw.ORDERS;
quit;
%let b=&b &gaby;
%end;
%put &b;
%mend g;
%g();
After this, I wanted to add both to a table, but I can add the vars variable but not the b variable.
data a;
call symput('lista', symget('vars'));
call symput('lista1', symget('b'));
do i=1 to 3;
timept=i;
variable=scan("&vars",i);
dist=scan("&b",i);
output;
end;
run;
The table shows correctly the name of the variables but instead of showing the count distinct (that were stored in b) shows the letter "b".
Is there a way to perform this? also, is there a way to perform it easily?
Thanks!!!!!!!!!!
You're pretty close. I would just use a single SQL pass and create an output table directly. If you want it in a column form, then use PROC TRANSPOSE.
proc sql noprint;
select name
into :vars separated by ' '
from dictionary.columns
where libname eq 'SASHELP' and
memname eq "SHOES";
quit;
%put &vars;
%macro create_table();
proc sql noprint;
%local i n var;
%let n = %sysfunc(countw(&vars));
create table output as
select
%do i=1 %to %eval(&n-1);
%let var = %scan(&vars,&i);
count(distinct &var) as &var,
%end;
%let var = %scan(&vars,&n);
count(distinct &var) as &var
from sashelp.shoes;
quit;
%mend;
%create_table;
proc transpose data=output out=want(rename=(_NAME_=variable COL1=Dist));
run;

performing tests over variables in SAS

I was wondering if it was possible to perform a ttest (proc ttest) over all variables in a dataset in SAS. Possibly through looping over the data?
Here's what I have currently but it's not running correctly:
data test;
set work.wisc;
array Avar(30) V1-V30;
do variable = 1 to 30;
proc ttest data = work.wisc;
class Diagnosis;
var Avar(variable);
end;
run;
Any help is much appreciated. Thanks!
Something like this may work. Calling the &&name&i. in the loop will reference each variable name. You may need to make some adjustments within the proc ttest as I'm not familiar with that function.
/* -- Get the names of the variables --*/
proc contents data = work.wisc out = names noprint; run;
/*--- Make macro vars needed ---*/
proc sql noprint;
select
count(distinct name) into :name_count from names;
select
distinct name into :name1 - :name9999 from names;
quit;
/*--- Strip spaces from name_count ---*/
%let name_count = &name_count.;
%put There are &name_count. variables in the data set;
/*--- Run the test for all variables ---*/
%macro testAll();
%do i = 1 %to &name_count.;
proc ttest data = work.wisc;
class Diagnosis;
var Avar(&&name&i.);
run;
%end;
%mend;
%testAll();