How to filter columns in SAS - sas

Suppose you have a data set such as the following:
FSA1 SSA1 SBW1
1 2 3
Is there a way in a data step to filter columns that do not contain 'SA'? I don't want to use a drop or a keep statement as the real dataset has hundreds of variables.

Something like this:
proc sql;
select name into: dropnames
separated by " "
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
having name contains 'He';
quit;
data class;
set sashelp.class;
drop &dropnames;
run;

Related

Variable in a list datastep SAS

Suppose that I have the following list
proc sql;
select name into: list separated by ' ' from dataset;
quit;
and I want to keep the elements of this list in my dataset2. The datastep
data dataset2;
set dataset2;
if name in &list.;
run;
does not work. How should I modify the first proc sql so that the datastep makes sense? Thanks in advance!
This assumes you're looking to filter values within a column, not the number of columns.
If the value is numeric this is what you need:
data dataset3;
set dataset2;
if name not in (&list.);
run;
If the value is character then this is what you need:
proc sql;
select quote(name) into: list separated by ' ' from dataset;
quit;
data dataset3;
set dataset2;
if name not in (&list.);
run;

sas: how to create a variable containing a list of different variables in 2 datasets

I'm kinda new to SAS.
I have 2 datasets: set1 and set2.
I'd like to get a list of variables that's in set2 but not in set1.
I know I can easily see them by doing proc compare and then listvar,
however, i wish to copy&paste the whole list of different variables instead of copying one by one from the report generated.
i want either a macro variable containing a list of all different variables separated by space, or printing out all variables in plain texts that I can easily copy everything.
proc contents data=set1 out=cols1;
proc contents data=set2 out=cols2;
data common;
merge cols1 (in=a) cols2 (in=b);
by name;
if not a and b;
keep name;
run;
proc sql;
select name into :commoncols separated by ','
from work.common;
quit;
Get the list of variable names and then compare the lists.
Conceptually the simplest way see what is in a dataset is to use proc contents.
proc contents data=set1 noprint out=content1 ; run;
proc contents data=set2 noprint out=content2 ; run;
Now you just need to find the names that are in one and not the other.
An easy way is with PROC SQL set operations.
proc sql ;
create table in1_not_in2 as
select name from content1
where upcase(name) not in (select upcase(name) from content2)
;
create table in2_not_in1 as
select name from content2
where upcase(name) not in (select upcase(name) from content1)
;
quit;
You could also push the lists into macro variables instead of datasets.
proc sql noprint ;
select name from content1
into :in1_not_in2 separated by ' '
where upcase(name) not in (select upcase(name) from content2)
;
select name from content2
into :in2_not_in1 separated by ' '
where upcase(name) not in (select upcase(name) from content1)
;
quit;
Then you could use the macro variables to generate other code.
data both;
set set1(drop=&in1_not_in2) set2(drop=&in2_not_in1) ;
run;

replacing field name suffixes in bulk

I have a dataset where I have several variables with suffixes that correspond to given dates. I want to replace the suffixes with the dates to make my output tables more user friendly.
Here is a sample of my code
the fields in my sales dataset are
product number_of_sales_1 number_of_sales_2 number_of_sales_3 revenue_1 revenue_2 revenue_3 tax_1 tax_2 tax_3
The suffixes 1,2,3 correspond to dates which are held in a second dataset with the following format
dates
id date
1 01Apr
2 01May
3 01Jun
I want to bulk replace the suffixes with the dates so my fields in sales become
product number_of_sales_01Apr number_of_sales_01May number_of_sales_01Jun revenue_01Apr revenue_01May revenue_01Jun tax_01Apr tax_01May tax_01Jun
Both the number of dates and the numberof metrics in sales are dynamic so I can't just hardcode in the the code.
I assume your datasets look like below:
data sales;
product="abc";number_of_sales_1=1;number_of_sales_2=2;number_of_sales_3=3;
revenue_1=1000;revenue_2=2000;revenue_3=3000;tax_1=100;tax_2=200;tax_3=300;
run;
data dates;
id=1;date="01Apr";output;id=2;date="01May";output;id=3;date="01Jun";output;
run;
1st Step - Finding out the dates variables which needs to be renamed
proc contents data=sales out=sales_temp(keep=name) noprint; run;
data sales_temp1;
length check_date_vars $1. id 8.;
set sales_temp;
check_date_vars=compress(substr(name,length(name)));
temp=notdigit(check_date_vars);
if temp=0 then id=check_date_vars;
run;
2nd step - Merging the above dataset with the datset which contains the formats, to create a mapping between old names and new names and creating macro variables out of it
proc sort data=sales_temp1; by id; run;
proc sort data=dates; by id; run;
data sales_temp_date;
merge sales_temp1(in=a) dates(in=b);
by id;
if a and b;
new_name=substr(name,1,length(name)-1)||date;
run;
proc sql noprint;
select count(*) into :num_vars separated by " " from sales_temp_date;
quit;
proc sql noprint;
select name into:old_name1 - :old_name&num_vars. from sales_temp_date;
select new_name into:new_name1 - :new_name&num_vars. from sales_temp_date;
quit;
3rd Step - Renaming the variables
%macro rename();
proc datasets library=work nolist;
modify sales;
rename
%do i=1 %to &num_vars.;
&&old_name&i.= &&new_name&i.
%end;
;
run;
%mend;
%rename;

finding max of many columns using proc sql statement

I am trying to write a PROC SQL query in SAS to determine maximum of many columns starting with a particular letter (say RF*). The existing proc means statement which i have goes like this.
proc means data = input_table nway noprint missing;
var age x y z RF: ST: ;
class a b c;
output out = output_table (drop = _type_ _freq_) max=;
run;
Where the columns RF: refers to all columns starting with RF and likewise for ST. I was wondering if there is something similar in PROC SQL, which i can use?
Thanks!
Dynamic SQL is indeed the way to go with this, if you must use SQL. The good news is that you can do it all in one proc sql call using only one macro variable, e.g.:
proc sql noprint;
select catx(' ','max(',name,') as',name) into :MAX_LIST separated by ','
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS'
and type = 'num'
/*eq: is not available in proc sql in my version of SAS, but we can use substr to match partial variable names*/
and upcase(substr(name,1,1)) in ('A','W') /*Match all numeric vars that have names starting with A or W*/
;
create table want as select SEX, &MAX_LIST
from sashelp.class
group by SEX;
quit;

join tables from a library

hi am trying to append the data-sets from a library which contain a specific column variable in them.for example i want to append those data-sets which contain the name column in them from myfile library.
below is my sample code--->
libname myfile'\c:data';
proc sql noprint ;
select distinct catx(".",libname,memname) into :DataList separated by " "
from dictionary.columns
where libname = upcase(myfile) and upcase(name);
quit;
Assuming that the type of the variable is consistent across all datasets something as simple as SET will work:
Data want;
Set &datalist;
Run;