Could someone please explain what the following lines of this code do? - sas

This is the code that I have with me:
%let data=sashelp.cars;
proc transpose data=&data(obs=0) out=names;
var _all_;
run;
proc sql;
select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by ' ' from names;
run;
data missing;
set &data;
_BIGN = 1;
/*%m_expand_varlist(data=&data,expr=cats('_',_name_,'=missing(',_name_,');'));*/
&stmts;
keep _:;
run;
proc summary data=missing;
var _numeric_;
output out=smry sum=;
run;
proc transpose data=smry(drop=_type_ _freq_) out=smry_;
run;
The goal of this code is to output the number of missing values for both character and numeric variables in the data set. The code accomplishes that objective but I have difficulty in understanding the purpose of certain lines in the code.
May I know what the following part of the code does?
select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by ' ' from names;
I understand that the into part just stores the value into the macro variable stmts but what does "separated by ' ' from names" in the above line mean?
data missing;
set &data;
_BIGN = 1;
&stmts;
keep _:;
run;
And in the above portion of the code, what is the purpose of "keep :"? What does the ":" do in that? And is the "_BIGN = 1" necessary?
And also in the final output table called smry_, I get underscores before the names of the variables. But I don't need these underscores. What can I do to remove them? When I removed the underscore after the "keep :", the underscores in the smry table went away but I was left with only 10 rows instead of 15. Help would be appreciated. Thank you.

Before answering your questions, let me disclaim that this is not the way to go if you simply want a count of missing values for each numeric variable in your data.
However, this seems to be more of a practice assignment than an actual problem.
The Separated By Clause simply inserts the specific string between the values in the macro variable if the data source has >1 items. In this case the names data set has 15 items, so all 15 values are listed with a few spaces between them.
The colon operator in the keep statement tells the data step to keep only variables prefixed with an underscore.
_BIGN is not strictly necessary. However, it seems that the author of the code wants a simple count of observations in the final data set. That is all it does.
The underscores are applied to each variable name in the creation of the macro variable in. It is probably done to avoid conflicts between variable names (though this is technically still possible). Obviously, you can simply remove the underscore in the final data set.

As you know the "into" clause stores the values in the macro variable. The "separated by" leads to a list of values stored in the variable with spaces as delimiter here. If you don't use this you have only the value of the first row in your macro variable.
The ":" is a wildcard that means you keep all the variables starting with an underscore:
keep _:;

I managed to remove the underscores in the final data set using the following code:
data _smry_(drop = _name_);
set smry_;
name=compress(_name_, , 'kas');
run;

You could just rename the underscore variables back to original.
%let data=sashelp.heart;
proc transpose data=&data(obs=0) out=names;
var _all_;
run;
proc sql;
select cats('Label _',_name_,'=',quote(_label_),';') into: lb_stmts separated by ' ' from names;
select cats('_',_name_,'=missing(',_name_,');') into: stmts separated by ' ' from names;
select cats('_',_name_,'=',_name_) into: rn_stmts separated by ' ' from names;
run;
options symbolgen=1;
data missingV / view=missingV;
set &data;
&stmts;
&lb_stmts;
rename &rn_stmts;
keep _:;
run;
options symbolgen=0;
ods output summary=summary;
proc means data=missingV Sum Mean N STACKODSOUTPUT;
var _numeric_;
run;
ods output close;
proc print width=minimum label;
label sum='#Missing' mean='%Missing';
format sum 8. mean percentn8.1;
run;

Related

Renaming all variables from a SAS Table

I have two SAS tables which are the same, only the column names aren't the same.
The first table D1 has 80 column names that have the following pattern X1000_a010_b020 and the second table D2 has 80 column names that have the following pattern X_1000_a0010_b0020. Please note that they are not in the same order.
I want to make sure that all the columns from D1 have the same names as in D2. In other words, I want to add the underscore after the X and add a 0 after all the a's and b's.
However I don't how to proceed. I would guess that RegEx would be the go to but I am not familiar with it.
As a structure example, some times ago I was using the following code to replace spaces in a column name with an underscore. I would like to do the same but for the underscore after the X and the 0 after the a's and b's.
%macro rename_vars(table);
%local rename_list sqlobs;
proc sql noprint;
select catx('=',nliteral(name),translate(trim(name),'_',' '))
into :rename_list separated by ' '
from sashelp.vcolumn
where libname=%upcase("%scan(work.&table,-2,.)")
and memname=%upcase("%scan(&table,-1,.)")
and indexc(trim(name),' ')
;
quit;
%if &sqlobs %then %do ;
proc datasets lib=%scan(WORK.&table,-2);
modify %scan(&table,-1);
rename &rename_list;
run;
quit;
%end;
%mend rename_vars;
Your example code seems to show you have a plan for how to implement the renaming so let's just concentrate on generating the OLDNAME <-> NEWNAME pairs. You can generate a list of names in a particular dataset with PROC CONTENTS or querying DICTIONARY.COLUMNS with SQL code (or SASHELP.VCOLUMN with any tool). So let's assume you have a dataset named CONTENTS that contains a variable named NAME. So the goal is to create a new variable, which we can call NEWNAME.
So let's just translate the three transformations you say you need directly into individual actions. You can collapse the steps if you want, but there is no pressing need for efficiency in this operation.
data fixed_names;
set contents;
newname = tranwrd(upcase(name),'_A','_A0');
newname = tranwrd(newname,'_B','_B0');
newname = cats(char(newname,1),'_',substr(newname,2));
keep name newname;
run;
Now you could pull that list into a macro variable. So a space delimited list of old=new pairs is useful for rename.
proc sql noprint;
select catx('=',name,newname) into :renames
from fixed_names
where newname ne upcase(name)
;
quit;
Or if the goal is to literally compare the two datasets you might want to generate one list of old names and a separate list of new names.
select name,newname
into :oldlist separated by ' '
, :newlist separated by ' '
from fixed_names
;
Which you could then use with PROC COMPARE directly without any need to rename any variables.
proc compare data=DS1 compare=DS2 ;
var &oldlist;
with &newlist;
run;

SAS: How to calculate frequency for all character variables except some

I know I can have something like the following to calculate frequency for all Chars:
proc freq data=sashelp.class;
tables _char_;
run;
However, is there a way to exclude some variables? I want to do something like:
proc freq data=sashelp.class;
tables _char_ EXCEPT VAR1 VAR2;
run;
Thank you so much!
you can use drop = , as shown below.
proc freq data=sashelp.cars(drop=origin make);
tables _char_;
run;
The drop example is the simplest, certainly, and probably best if that's exactly the request.
However, if it's slightly different - such as, you want to include (or exclude) all character variables matching a particular pattern, you can use macro variables constructed from dictionary.columns (or proc contents output dataset).
proc sql;
select name
into :freqlist separated by ' '
from dictionary.columns
where memname='YOURTABLE' and libname='YOURLIB'
and type='char' and name like 'PATTERN%'
;
quit;
Obviously filling in the various uppercase things as appropriate. Usually MEMNAME, LIBNAME, and NAME are stored upper case, though not always, so consider adding UPCASE() to them.
Then you can put &freqlist.; on the tables statement to get the list of columns that match your query.

SAS: Passing in Array Names

I'm working on code that will change the coding of several hundred variables stored as 1/0 or Y/N in numeric 1 or 0. Because this will need to be in a flexible process, I am writing a macro to do so. The only issue that I am have with the macro is I am unable to pass the SAS column names to the macro to work. Thoughts?
%Macro Test(S,E);
%Array(A,&S.-&E.);
%MEnd;
data subset;
set dataset);
%Test(v1,v20)
run;
SAS supports variable lists. Macro parameters are just text strings. So as long as you use the macro variable value in a place where SAS supports variable lists there is no problem passing a variable list to a macro. For example here a simplistic macro to make an array statement.
%macro array(name,varlist);
array &name &varlist ;
%mend;
Which you could then use in the middle of a data step like this.
data want;
set have ;
%array(binary,var1-var20 a--d male education);
do over binary; binary=binary in ('Y','1','T'); end;
run;
The difficult part is if you want to convert variables from character to numeric then you will need to rename them. This will make it difficult to use variable lists (x1-x5 or vara -- vard). You can solve that problem with a little extra logic to convert the variable lists into a list of individual names. For example you can use PROC TRANSPOSE to create a dataset with the variable names that match your list.
proc transpose data=&inds(obs=0) out=_names ;
var &varlist;
run;
You could then use this dataset to generate code or generate a list of the individual variable names.
proc sql noprint ;
select name into :varlist2 separated by ' ' from _names;
quit;
A list of all variable names is stored in the dictionary.columns dataset. You can access it and store the names as a list that you can then loop through:
proc sql noprint;
select name into: list_of_names
separated by " "
from dictionary.columns where memname = upcase("your_dataset");
quit;
%put &list_of_names.;

SAS Proc SQL Trim not working?

I have a problem that seems pretty simple (probably is...) but I can't get it to work.
The variable 'name' in the dataset 'list' has a length of 20. I wish to conditionally select values into a macro variable, but often the desired value is less than the assigned length. This leaves trailing blanks at the end, which I cannot have as they disrupt future calls of the macro variable.
I've tried trim, compress, btrim, left(trim, and other solutions but nothing seems to give me what I want (which is 'Joe' with no blanks). This seems like it should be easier than it is..... Help.
data list;
length id 8 name $20;
input id name $;
cards;
1 reallylongname
2 Joe
;
run;
proc sql;
select trim(name) into :nameselected
from list
where id=2;
run;
%put ....&nameselected....;
Actually, there is an option, TRIMMED, to do what you want.
proc sql noprint;
select name into :nameselected TRIMMED
from list
where id=2;
quit;
Also, end PROC SQL with QUIT;, not RUN;.
It works if you specify a separator:
proc sql;
select trim(name) into :nameselected separated by ''
from list
where id=2;
run;

SAS - Creating variables from macro variables

I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;