Deleting variable names containing specific string - sas

I'm just starting to learn SAS and wanted to see if anyone knew of a way to delete certain variables from a dataset if they contained a certain word. I'm working with a dataset that contains a huge amount of variables (100+) with the word 'Label' in them and am looking to drop these. Unfortunately the word label comes at the end of the variable name, so I can't do a simple drop label:; Obviously I could individually list all the variables to drop, but I just wanted to see if anyone out there knew of a simpler way to accomplish this task. Thanks for reading and for any help you have to offer up.

Using a the vcolumn table and proc sql to create a macro variable a macro variable:
proc sql noprint;
select trim(compress(name))
into :drop_vars separated by ' '
from sashelp.vcolumn
where libname = upcase('lib1')
and
memname = upcase('table1')
and
upcase(name) like '%LABEL%'
;
quit;
%put &drop_vars.;
data table2;
set table1;
drop &drop_vars.;
run;
the proc sql will create a list of all the variables from table1 in library 'lib1' containing label anywhere in the name and put it into the macro variable called drop_vars. (upcase is used to reduce possibility of case causing an issue)
The data step then uses the drop statement and the drop_vars variable to drop all variables in the list.
Note: Make sure you check the output of the %put statement to ensure you do not drop variables you want to keep

What you need to do is come up with a dataset that contains the variable names, then create a macro variable containing those you want to drop. There are three (or more) options for the first part:
dictionary.columns
sashelp.vcolumn
proc contents output to a dataset
All three give the same result - a dataset of variable names (and other things), which you can then query.
So for example, using PROC SQL's SELECT INTO functionality to create a macro variable:
proc sql;
select name into :droplist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
and name like '%eigh%';
quit;
(replace eigh with Label for your needs; % is wildcard here)
and then you have a macro variable &droplist, which you can then use in a drop statement.
data want;
set sashelp.class;
drop &droplist;
run;

Related

Determine which variables in a given sas dataset are sort variables

Ho can I determine which variables in a given SAS dataset are used in the sort of the dataset (if the dataset is sorted)?
I prefer to use one of the sashelp datasets, rather than proc contents.
You can actually use the SASHELP views to find the answer. If you run this step.
proc print data=sashelp.vcolumn;
where libname='SASHELP'
and memname like 'V%'
and upcase(name) like '%SORT%'
;
run;
You will see that the variable SORTEDBY in the view SASHELP.VCOLUMN will have the indication of whether that variable is part of the sort key for a dataset. The value should show the order that the variables are to be used in the BY statement.

Set macro variable to frequency count from proc freq

Is it possible to create a macro variable that is set to the frequency variable produced by proc freq? I'm trying to create a variable that will equal the number of times each last name appears in a data set. For example, Smithe may appear 3 times while Jackson appears only 2 times. I want to capture that value and use it.
You can either use call symput in a following data step (after outputting the proc freq to a dataset with /out= and/or using ods output), or you can avoid the proc freq and do the frequency with proc sql via select into, which will create a macro variable.
proc sql;
select sex, count(1)
into :sex separated by ' ',
:count separated by ' '
from sashelp.class
group by sex;
quit;
This makes something approximating a pair of macro variable arrays (one with the values one with the counts). If you want to use the name (or whatever) as the macro variable name, use the first option (follow-on datastep with call symput) as that lets you name the macro variable.

Rename Variable Regardless of its Name in SAS

Lets suppose we have the following dataset:
ID Stress_Level Heart_Rate
1 5 10
2 7 12
3 9 16
And the code one would use to rename a variable would be:
data test1;
set test0;
rename Stress_Level=A Heart_Rate=B;
run;
However, what I would like to do is to rename the 2 columns without using their names. Is there an "internal" SAS command that addresses the variable depending on which column it is? So for instance Stress_Level which is the 2nd column could be addressed as "COL2 " or something similar. Thus the code would be:
data test1;
set test0;
rename COL2=A COL3=B;
run;
Where "COL2" would always refer to the second column in the dataset regardless of its name. Is there a direct or maybe an indirect way to achieve that?
I think the easiest way is to build up a rename statement string from the metadata table DICTIONARY.COLUMNS (the view of this is SASHELP.VCOLUMN). This holds the column names and position for all tables in active libraries.
I've taken advantage of the ASCII sequence (the byte function) to rename the columns A, B etc, obviously you'd run into problems if there are more than 26 columns to be renamed in the table!
You'll also need to tweak the varnum+63 calculation if you wanted to start from a different column than 2.
proc sql noprint;
select cats(name,"=",byte(varnum+63)) into :newvars separated by ' '
from dictionary.columns
where libname = 'WORK' and memname='HAVE' and varnum>=2;
quit;
data want;
set have;
rename &newvars.;
run;
/* or */
/*
proc datasets lib=work nolist nodetails;
modify have;
rename &newvars.;
quit;
*/
There are a couple of ways you can do this.
The shortest approach is probably to use an array. The only drawbacks are that you need to know the types of the variables in advance and the name of the first variable.
If they are all numeric as in your example the following could be used:
data test1;
set test0;
array vars[*] _numeric_;
A = vars[2];
B = vars[3];
keep ID A B;
run;
You can only have one type of variable in an array, so it's slightly more complicated if they are not all numeric or all character. Additionally you will need to know the name of the first variable and any other variables that you wish to keep if you don't want to have the duplicates of the second and third variables.
A more robust approach is to use information from a dictionary table and a macro variable to write your rename statement:
proc sql;
/* Write the individual rename assignments */
select strip(name) || " = " || substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", varnum - 1, 1)
/* Store them in a macro variable and separate them by spaces */
into :vars separated by " "
/* Use a sas dictionary table to find metadata about the dataset */
from sashelp.vcolumn
where
libname = "WORK" and
memname = "TEST0" and
2 <= varnum <= 3;
quit;
data test1;
set test0;
rename &vars.;
run;
SAS stores information about datasets in dictionary tables, which have views available in the sashelp library. Take a look in some of the sashelp.v* tables to see what kind of information is available. The proc sql colon is used to store values in a macro variable, which can then be used in the rename statement.
I'd recommend the second approach as it is considerably more flexible and less dependent on the exact structure of your data. It also expands better when you have more than a couple of variables to rename.
Finally, if you want to make the changes to a dataset in place you may want to take a look at using proc datasets (in combination with the dictionary table approach) to do the renaming, as this can change the variable names without having to read and write every line of data.

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.

subset a dataset when the variables in the dataset matches a variable list

I'm dealing with one data problem in sas.
I have one dateset including 1000 variables and 1000 records for each variable.
And I have another variable list which includes 100 variable names.
I'd like to subset the first dataset when the variable names in that dataset match the variable list.
I tried proc merge and proc sql, but cannot work it out.
Could any one help me out?
Thanks a lot
SAS keeps or drops variables with the conveniently named keywords 'keep' and 'drop'. PROC SQL can help you generate a list if you don't already have it in text format.
data want;
set have;
keep var1 var2 var3 var4;
run;
If you have the list of variables in dataset "vnames" with the variable "tokeep", you can do this:
proc sql;
select tokeep into :keeplist separated by ' ' from vnames;
quit;
data want;
set have;
keep &keeplist.;
run;
PROC SQL is taking the contents of 'tokeep' and instead of selecting them to a table or the screen, putting them in a space-delimited list inside a macro variable 'keeplist', which then is used as the arguments for the 'keep' statement.
Here you can find how to output a list of all the variable names of a dataset as another dataset. This will make it way easier to decide which of the big datasets you will use and which you will not (e.g. a left (or right) join of variable names, then look at the number of rows is at least the count of variables which you want to have).