I have a column name from a .xls file (using excelcs engine for importing) which is dynamic and changes from day to day.
I am wondering how to do reference and rename that dynamic column name within sas without knowing before hand what it will be called?
This depends some on how it dynamically changes. If it is entirely unpredictable - you cannot write code to figure it out, or to eliminate the other known columns - your simplest option may be to use GETNAMES=NO, and then set the names yourself.
If it is in some way predictable (such as it is "MYDYNAMIC_XXXX" where XXXX changes in some fashion), you can probably figure it out from dictionary.columns. (Modify libname/memname/etc. as appropriate; memname is the dataset name.)
proc sql;
select name into :dynname
from dictionary.columns
where libname='WORK' and memname='MYDATASET'
and name like 'MYDYNAMIC_%';
quit;
Alternately, you could use a NOT(IN(...)) clause to eliminate the known column names, if you need to know that.
Finally, if it is in one consistent location, easier than using GETNAMES=NO may be to query dictionary.columns based on the variable number (where varnum=5 for example if it is the fifth variable number).
Expanding on Joe's last comment - is the column is in the same position, just called something different?
If so, you can use the dictionary.columns table, selecting the specific column number, and storing the corresponding column name in a macro variable.
Example, your column is the 5th column in Excel/dataset...
/* Pull column name */
proc sql ;
select name into :DYNVAR
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS'
and varnum = 5 ;
quit ;
/* Then to reference the column simply substitute it for &DYNVAR */
data want ;
set sashelp.class (keep=&DYNVAR) ;
run ;
You could then extend this to multiple columns if necessary...
/* Pull column name */
proc sql ;
select name into :DYNVARS separated by ' '
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS'
and varnum in (1,4,5) ;
quit ;
/* Then to reference the columns simply substitute it for &DYNVARS */
data want ;
set sashelp.class (keep=&DYNVARS) ;
run ;
Related
I have two SAS tables which are the same, only the column names aren't the same.
The first table D1 has 80 column names that have the following pattern X1000_a010_b020 and the second table D2 has 80 column names that have the following pattern X_1000_a0010_b0020. Please note that they are not in the same order.
I want to make sure that all the columns from D1 have the same names as in D2. In other words, I want to add the underscore after the X and add a 0 after all the a's and b's.
However I don't how to proceed. I would guess that RegEx would be the go to but I am not familiar with it.
As a structure example, some times ago I was using the following code to replace spaces in a column name with an underscore. I would like to do the same but for the underscore after the X and the 0 after the a's and b's.
%macro rename_vars(table);
%local rename_list sqlobs;
proc sql noprint;
select catx('=',nliteral(name),translate(trim(name),'_',' '))
into :rename_list separated by ' '
from sashelp.vcolumn
where libname=%upcase("%scan(work.&table,-2,.)")
and memname=%upcase("%scan(&table,-1,.)")
and indexc(trim(name),' ')
;
quit;
%if &sqlobs %then %do ;
proc datasets lib=%scan(WORK.&table,-2);
modify %scan(&table,-1);
rename &rename_list;
run;
quit;
%end;
%mend rename_vars;
Your example code seems to show you have a plan for how to implement the renaming so let's just concentrate on generating the OLDNAME <-> NEWNAME pairs. You can generate a list of names in a particular dataset with PROC CONTENTS or querying DICTIONARY.COLUMNS with SQL code (or SASHELP.VCOLUMN with any tool). So let's assume you have a dataset named CONTENTS that contains a variable named NAME. So the goal is to create a new variable, which we can call NEWNAME.
So let's just translate the three transformations you say you need directly into individual actions. You can collapse the steps if you want, but there is no pressing need for efficiency in this operation.
data fixed_names;
set contents;
newname = tranwrd(upcase(name),'_A','_A0');
newname = tranwrd(newname,'_B','_B0');
newname = cats(char(newname,1),'_',substr(newname,2));
keep name newname;
run;
Now you could pull that list into a macro variable. So a space delimited list of old=new pairs is useful for rename.
proc sql noprint;
select catx('=',name,newname) into :renames
from fixed_names
where newname ne upcase(name)
;
quit;
Or if the goal is to literally compare the two datasets you might want to generate one list of old names and a separate list of new names.
select name,newname
into :oldlist separated by ' '
, :newlist separated by ' '
from fixed_names
;
Which you could then use with PROC COMPARE directly without any need to rename any variables.
proc compare data=DS1 compare=DS2 ;
var &oldlist;
with &newlist;
run;
I'm kinda new to SAS.
I have 2 datasets: set1 and set2.
I'd like to get a list of variables that's in set2 but not in set1.
I know I can easily see them by doing proc compare and then listvar,
however, i wish to copy&paste the whole list of different variables instead of copying one by one from the report generated.
i want either a macro variable containing a list of all different variables separated by space, or printing out all variables in plain texts that I can easily copy everything.
proc contents data=set1 out=cols1;
proc contents data=set2 out=cols2;
data common;
merge cols1 (in=a) cols2 (in=b);
by name;
if not a and b;
keep name;
run;
proc sql;
select name into :commoncols separated by ','
from work.common;
quit;
Get the list of variable names and then compare the lists.
Conceptually the simplest way see what is in a dataset is to use proc contents.
proc contents data=set1 noprint out=content1 ; run;
proc contents data=set2 noprint out=content2 ; run;
Now you just need to find the names that are in one and not the other.
An easy way is with PROC SQL set operations.
proc sql ;
create table in1_not_in2 as
select name from content1
where upcase(name) not in (select upcase(name) from content2)
;
create table in2_not_in1 as
select name from content2
where upcase(name) not in (select upcase(name) from content1)
;
quit;
You could also push the lists into macro variables instead of datasets.
proc sql noprint ;
select name from content1
into :in1_not_in2 separated by ' '
where upcase(name) not in (select upcase(name) from content2)
;
select name from content2
into :in2_not_in1 separated by ' '
where upcase(name) not in (select upcase(name) from content1)
;
quit;
Then you could use the macro variables to generate other code.
data both;
set set1(drop=&in1_not_in2) set2(drop=&in2_not_in1) ;
run;
I got a SAS dataset DATA which contains 100 variables. Unfortunately, this dataset dont contain the name of each variable. It just name the variable as VAR1 - Var100. I got a seperate file which list the name of each variable Name (one name per cell). I donot want to rename it one by one so the following code is not an option.
data lib.test (rename = (var1= truename1 var2 = truename2 ...) ;
set lib.test;
run;
Following Reeze's suggestions, I try to implement the following solution http://stackoverflow.com/questions/29006915/rename-variable-regardless-of-its-name-in-sas.
proc sql;
/* Write the individual rename assignments */
select strip(name) || " = " || substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", varnum , 1)
/* Store them in a macro variable and separate them by spaces */
into :vars separated by " "
/* Use a sas dictionary table to find metadata about the dataset */
from sashelp.vcolumn
where libname = "LIB"
and memname = "TEST"
and 1 <= varnum <= 100;
quit;
proc datasets lib=lib nolist nodetails;
modify test;
rename &vars.;
quit;
Now, instead of using a,b,c,d ... to rename my variable, I want to use the name on datasetName as new names. Dataset Name looks like the following (I can transpose it if it is easier to use). The order of Name is the same as variable sequence in dataset lib.test. How can I change the code above to achieve this?
Name
name1
anc
sjsjd
mdmd
You can convert your NAME dataset to have both the old and new names and then use that to generate the rename pairs.
data name_pairs;
set name ;
old_name = cats('VAR',_n_);
run;
proc sql noprint ;
select catx('=',old_name,name)
into :vars separated by ' '
from name_pairs
;
quit;
You can then use the macro variable VARS in your rename statement.
I hope someone can help. I have a large dataset imported to SAS with thousands of variables. I want to create a new dataset by extracting variables that have a specific keyword in their name. For example, the following variables are in my dataset:
AAYAN_KK_Equity_Ask
AAYAN_KK_Equity_Bid
AAYAN_KK_Equity_Close
AAYAN_KK_Equity_Date
AAYAN_KK_Equity_Volume
AAYANRE_KK_Equity_Ask
AAYANRE_KK_Equity_Bid
AAYANRE_KK_Equity_Close
AAYANRE_KK_Equity_Date
I want to extract variables that end with _Ask and _Bid without knowing the rest of the variable's name. Is there a way to do that? I want to try using a do loop but don't know how to instruct SAS to compare each variable's last part of the name with _Ask or _Bid.
Afterwords. I want to create a new variable for each set that starts with full name of the variable except the last part (Which is _Ask or _Bid). Can I do that in using an assignment statement?
You probably want to query sashelp.vtable which holds the metadata about your data set. Assuming your data is in the library WORK and called TABLE the following creates a list of the variables that end in ASK.
proc sql;
select name into :varlist separated by " "
from sashelp.vcolumn
where libname="WORK" and memname="TABLE" and upcase(name) like '%_ASK';
quit;
*To rename the variables with MID generate a rename statement;
proc sql;
select catx("=", name, tranwrd(upcase(name), "_ASK", "_MID"))
into :rename_list separated by " "
from sashelp.vcolumn
where libname="WORK" and memname="TABLE" and upcase(name) like '%_ASK';
quit;
%put &rename_list;
data want_ask;
set work.table
(keep = &varlist);
rename &rename_list;
run;
I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!
You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;
I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.