I'm working on code that will change the coding of several hundred variables stored as 1/0 or Y/N in numeric 1 or 0. Because this will need to be in a flexible process, I am writing a macro to do so. The only issue that I am have with the macro is I am unable to pass the SAS column names to the macro to work. Thoughts?
%Macro Test(S,E);
%Array(A,&S.-&E.);
%MEnd;
data subset;
set dataset);
%Test(v1,v20)
run;
SAS supports variable lists. Macro parameters are just text strings. So as long as you use the macro variable value in a place where SAS supports variable lists there is no problem passing a variable list to a macro. For example here a simplistic macro to make an array statement.
%macro array(name,varlist);
array &name &varlist ;
%mend;
Which you could then use in the middle of a data step like this.
data want;
set have ;
%array(binary,var1-var20 a--d male education);
do over binary; binary=binary in ('Y','1','T'); end;
run;
The difficult part is if you want to convert variables from character to numeric then you will need to rename them. This will make it difficult to use variable lists (x1-x5 or vara -- vard). You can solve that problem with a little extra logic to convert the variable lists into a list of individual names. For example you can use PROC TRANSPOSE to create a dataset with the variable names that match your list.
proc transpose data=&inds(obs=0) out=_names ;
var &varlist;
run;
You could then use this dataset to generate code or generate a list of the individual variable names.
proc sql noprint ;
select name into :varlist2 separated by ' ' from _names;
quit;
A list of all variable names is stored in the dictionary.columns dataset. You can access it and store the names as a list that you can then loop through:
proc sql noprint;
select name into: list_of_names
separated by " "
from dictionary.columns where memname = upcase("your_dataset");
quit;
%put &list_of_names.;
Related
I have a list of 15 similar variables that I want to loop through syntax to recode as null. I will be adding more variables to this list so that's why I went with a macro. I created a macro variable that stores a list of variables and another macro variable that stores the count of variables. I am having trouble defining the array. I got the code below to work, but it only goes through varlist1, and not the other 14 variables (varlist2-varlist15)... which is what I told it to do but I can't figure out how to expand it to the full list of variables without breaking the code. I searched through the forums and SAS articles but couldn't find an answer. I'm fairly new to arrays so I'm sure it's something simple I don't understand. appreciate any help and let me know if I can post better next time. first one. :)
this is my list in &varlist.
(
NEW_CREWSTATE1A
NEW_EquFailText
NEW_ExpMedNotList
NEW_NARRATIVEDISTRESS
NEW_NarrativeRecovery
NEW_NarrativeSearch
NEW_OthEquText
NEW_PersReqMedTxt
NEW_ProbEnct
NEW_Recommend
NEW_RescueSwimProbText
NEW_RescuerProb
NEW_SRUConfigTxt
NEW_equimalissue
new_MedDiffText
)
code:
proc sql noprint;
select count(name) into :numVar
from sashelp.vcolumn
where upcase(LIBname)="STAGING" and UPCASE(memname)="RESC_NARRATIVEMERGE" and UPCASE(name) like 'NEW_%';
quit;
%put &numVar;
proc sql noprint;
select distinct(name) into :varlist1-
from sashelp.vcolumn
where upcase(LIBname)="STAGING" and UPCASE(memname)="RESC_NARRATIVEMERGE" and UPCASE(name) like 'NEW_%';
quit;
data staging.RESC_NARRATIVEMERGE2;
set staging.RESC_NARRATIVEMERGE;
array narrative_array {*} &varlist1. ;
do i=1 to dim(narrative_array);
if strip(narrative_array{i})='N/A' then narrative_array{i}='';
if strip(narrative_array{i})='N/A.' then narrative_array{i}='';
if strip(narrative_array{i})='NA' then narrative_array{i}='';
if strip(narrative_array{i})='NONE' then narrative_array{i}='';
if strip(narrative_array{i})='NONE NOTED' then narrative_array{i}='';
if strip(narrative_array{i})='NONE EXPERIENCED' then narrative_array{i}='';
if strip(narrative_array{i})='NONE TO REPORT' then narrative_array{i}='';
if strip(narrative_array{i})='NONE.' then narrative_array{i}='';
if strip(narrative_array{i})='NOT APPLICABLE' then narrative_array{i}='';
end;
run;
The only thing you need to define an array is the actual list of variables names.
array narrative_array Avar Anothervar Someothervar ;
So just put the list of variable names into ONE macro variable.
proc sql noprint;
select nliteral(name)
into :varlist separated by ' '
from dictionary.columns
where libname="STAGING"
and memname="RESC_NARRATIVEMERGE"
and UPCASE(name) like 'NEW^_%' escape '^'
;
quit;
Note that there is no need to count them, but if you want the count then SQL will have already stored the count into the macro variable SQLOBS. Perhaps you can use the count to decide whether or not you need to define the array at all.
data staging.RESC_NARRATIVEMERGE2;
set staging.RESC_NARRATIVEMERGE;
%if &sqlobs %then %do;
array narrative_array &varlist ;
do index=1 to dim(narrative_array);
if left(compbl(narrative_array[index])) in
('N/A','N/A.','NA','NONE','NONE NOTED','NONE EXPERIENCED'
,'NONE TO REPORT','NONE.','NOT APPLICABLE')
then narrative_array[index]=' ';
end;
drop index ;
%end;
run;
Look at your variable name selection criteria
where upcase(LIBname)="STAGING"
and UPCASE(memname)="RESC_NARRATIVEMERGE"
and UPCASE(name) like 'NEW_%'
;
You are looking for variable names that start with NEW_. The DATA Step has a naming list syntax that selects variables based on prefix (<prefix>:), and that list can be used to specify the members of an array.
If there are no variables that match the specified list the array will have zero elements and the log will show the message "WARNING: Defining an array with zero elements." A loop can be coded simply as 1 to DIM(<array-name>) and no iterations will occur because the DIM() result is 0
data ...
set STAGING.RESC_NARRATIVEMERGE;
array NEW NEW_:;
As shown by #Tom your wallpaper of tests to transform non-values to blanks can be changed to a IN list for better performance and human clarity.
Working in SAS here, and have a lot of column names that I'd like to drop a pattern from. This is pretty straightforward in R:
colnames(data) <- gsub('drop_pattern', '', colnames(data))
But is there an equivalently elegant SAS way?
You can use the RENAME statement in PROC DATASETS to modify the names of variables in a dataset without having to make a new dataset.
proc datasets lib=mylib nolist;
modify mydata ;
rename freddrop_patterndy = freddy samdrop_patternmy=sammy ;
run;
quit;
You can use any number of functions, including those that support regular expressions, to construct a new name from an old name. For example if you just want to remove some constant text then something like this could work:
new_name = transtrn(old_name,'drop_pattern',trimn(' '));
You can use a query against the metadata of the variable names to generate the oldname=newname pairs into a macro variable.
proc sql noprint ;
select catx('=',name,transtrn(old_name,'drop_pattern',trimn(' '))
into :rename_list separated by ' '
from dictionary.column
where libname='MYLIB' and memname='MYDATA' and index(name,'drop_pattern')
;
quit;
Then you can use the macro variable in your code. You will probably need to skip this step if there are no names that need to be changed.
%if &sqlobs %then %do ;
proc datasets lib=mylib nolist;
modify mydata ;
rename &rename_list ;
run;
quit;
%end;
Note if you have set the VALIDVARNAME option to ANY then you will need to use the NLITERAL() function when generating the oldname=newname pairs to handle names that might not follow normal naming rules.
select catx('=',nliteral(name),nliteral(transtrn(old_name,'drop_pattern',trimn(' ')))
A novice in SAS here. I am trying to rename variables in a data set by using the new values I have in a list. Since I have multiple files with over 100 variables that need to be renamed, I created the following macro and I am trying to pass the list with the new names. However, I am not sure how to pass the list of variables and loop through it properly in the macro. Right now I am getting an error in the %do loop that says: "ERROR: The %TO value of the %DO I loop is invalid."
Any guidance will be greatly appreciated.
The list of new variables comes from another macro and it is saved in &newvars.
The number of variables in the files are the same number in the list, and the order they should be replaced is the same.
%macro rename(lib,dsn,newname);
proc sql noprint;
select nvar into :num_vars from dictionary.tables
where libname="&LIB" and memname="&DSN";
select distinct(nliteral(name)) into:vars
from dictionary.columns
where libname="&LIB" and memname="&DSN";
quit;
run;
proc datasets library = &LIB;
modify &DSN;
rename
%do i = 1 %to &num_vars.;
&&vars&i == &&newname&i.
%end;
;
quit;
run;
%mend rename;
%rename(pga3,selRound,&newvars);
Thank you in advance.
You are getting that error message because the macro variable NUM_VARS is not being set because no observations met your first where condition.
The LIBNAME and MEMNAME fields in the metadata tables are always uppercase and you called the macro with lowercase names.
You can use the %upcase() macro function to fix that. While you are at it you can eliminate the first query as SQL will count the number of variables for you in the second query. Also if you want that query to generate multiple macro variables with numeric suffixes you need to modify the into clause to say that. The DISTINCT keyword is not needed as a dataset cannot have two variables with the same name.
select nliteral(name)
into :vars1 -
from dictionary.columns
where libname=%upcase("&LIB") and memname=%upcase("&DSN")
;
%let num_vars=&sqlobs;
You also should tell it what order to generate the names. Were the new names generated in expectation that the list would be in the order the variables exist in the dataset? If so use the VARNUM variable in the ORDER BY clause. If in alphabetical order then use NAME in the ORDER BY clause.
How are you passing in the new names?
Is it a space delimited list? If so your final step should look more like this:
proc datasets library = &LIB;
modify &DSN;
rename
%do i = 1 %to &num_vars.;
&&vars&i = %scan(&newname,&i,%str( ))
%end;
;
run; quit;
If NEWNAME has the base name to use for a series of variable names with numeric suffixes then you would want this:
&&vars&i = &newname&i
If instead you are passing into NEWNAME a base string to use to locate a series of macro variables with numeric suffixes then the syntax would be more like this.
&&vars&i = &&&newname&i
So if NEWNAME=XXX and I=1 then on the first pass of the macro processor that line will transform into
&vars1 = &XXX1
And on the second pass the values of VARS1 and XXX1 would be substituted.
I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!
You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;
I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.
I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;