I have some results that came from a relational database in a SAS data set. All of the variable names start with numbers, so I can't rename them or access them in a data step. Is there any way to rename them or access them without getting the data out of the RDBMS again?
options validvarname=any; will allow you to access them, and perhaps even use the dataset - you can enclose an "illegal" variable name in "variable name"n (quotes then an n afterwards) to make a name literal which is equivalent to a variable name (like in Oracle using "variable name").
If you want to make them easier to use, you can do something like
proc sql;
select catx(' ','rename',name,'=',cats('_',name,';')) into :renamelist separated by ' '
from dictionary.columns
where libname='WORK' and memname='DATASETNAME'; *perhaps AND ANYDIGIT(substr(name,1,1)) as well;
quit;
proc datasets lib=work;
modify datasetname;
&renamelist;
quit;
You could also try setting options validvarname=v7; before you connect to the RDBMS as it's possible SAS will do this for you (depending on the situation) if you have it set that way (and don't currently).
The answer given by Joe has some helpful information, but I actually discovered that SAS has a (somewhat automatic) method for handling this. When you query data from an RDBMS, SAS will actually replace any column names starting with numbers with an underscore for the first character. So 1994Q4 becomes _994Q4. Thus, you can simply access the data that way.
SAS will, however, preserve the original name from the RDBMS as the variable title, so it will display as 1994Q4 (or whatever) in table view mode.
Related
I'm working with data that seem to be split into nearly arbitrary sets from year to year. What I would like to do is to be able to start by concatenating all of the .sas7bdat files in a single library. How would I go about this?
Alternatively, if I know all of the possible names that files in the library might be assigned (but many are potentially missing from any given library), how can I get SAS to ignore missing files? For instance, say that I know all of the .sas7bdat files in my library have one of the names "set01", "set02", "set03" or "set04". If a particular library ("L") is missing one of these, then the data step:
DATA temp;
SET L.set01 L.set02 L.set03 L.set04;
RUN;
will produce an error. Assuming that I know that at least one of these exists, is there an option that will tell SAS to ignore the missing ones?
(I understand that these are two totally different questions, but either would solve my immediate problem.)
in SAS there is an easy way for SAS to automatically choose the datasets that start with some common name, you can use following statement:
data temp;
set L.set0: ; /*It will search for all datasets that start with set0 and will set only those which are available*/
run;
Does it answer your query?
Second approach
libname L "Y:\Test Data";
proc sql;
select strip("L."||memname) into :DSNAME separated by ' '
from dictionary.tables
where libname='L';
quit;
/* Main final DS*/
data want;
set &DSNAME;
run;
It will extract all Dataset names in L directory and will create macro variable DSNAME such as : L.set01 L.oth02 etc. , common names won't matter here..
How can I produce a table that has this kind of info for multiple variables:
VARIABLE COUNT PERCENT
U 51 94.4444
Y 3 5.5556
This is what SAS spits out into the listing output for all variables when I run this program:
ods output nlevels=nlevels1 OneWayFreqs=freq1 ;
proc freq data=sample nlevels ;
tables _character_ / out=outfreq1;
run;
In the outfreq1 table there is the info for just the last variable in the data set (table shown above) but not for all for the variables.
In the nlevels1 table there is info of how many categories each variable has but no frequency data.
What I want though is to output the frequency info for all the variables.
Does anybody know a way to do this without a macro/loop?
You basically have two options, which are sort-of-similar in the kinds of problems you'll have with them: use PROC TABULATE, which more naturally deals with multiple table output, or use the onewayfreqs output that you already call for.
The problem with doing that is that variables may be of different types, so it doesn't have one column with all of that information - it has a pair of columns for each variable, which obviously gets a bit ... messy. Even if your variables are all the same type, SAS can't assume that as a general rule, so it won't produce a nice neat thing for you.
What you can do, though, particularly if you are able to use the formatted values (either due to wanting to, or due to them being identical!), is coalesce them into one result.
For example, given your freq1 dataset from the above:
data freq1_out;
set freq1;
value = coalesce(of f_:);
keep table value frequency percent;
run;
That combines the F_ variables into one variable (as always only one is ever populated). If you can't use the F_ variables and need the original ones, you will have to make your own variable list using a macro variable list (or some other method, or just type the names all out) to use coalesce.
Finally, you could probably use PROC SQL to produce a fairly similar table, although I probably wouldn't do it without using the macro language. UNION ALL is a handy tool here; basically you have separate subqueries for each variable with a group by that variable, so
proc sql;
create table my_freqs as
select 'HEIGHT' as var, height, count(1) as count
from sashelp.class
group by 1,height
union all
select 'WEIGHT' as var, weight, count(1) as count
from sashelp.class
group by 1,weight
union all
select 'AGE' as var, age, count(1) as count
from sashelp.class
group by 1,age
;
quit;
That of course can be trivially macrotized to something like
proc sql;
create table my_freqs as
%freq(table=sashelp.class,var=height)
union all
%freq(table=sashelp.class,var=weight)
union all
%freq(table=sashelp.class,var=age)
;
quit;
or even further either with a list processing or a macro loop.
I have a dataset(liste_institution) that contain all the name of the variable that I want to "define" in my proc report statement. Here is my code that work when I call my macro not dynamically(%create_institution(815);). If I use the data statement with the call execute(in comment in my code) it not working. The reason seem to be that when I use the call execute the code is not interpreted in a PROC REPORT that is why it give me error.
proc report data = ventes_all_inst4
missing split = "*" nowd
style(header)=[font_weight=bold background = #339966 foreground = white]
style(column)=[cellwidth=15cm];
%macro create_institution(institution);
define TOTAL_&institution. / display "TOTAL*($)" style(column)=[cellwidth=4cm];
%mend;
/* Give error when I use this data step */
/*data _null_;
set liste_institution;
call execute('%create_institution(' || INS || ');');
run;*/
%create_institution(815);
run;
Is there an easy way to create dynamically define statement in a PROC REPORT from a dataset that contain the column name.
Basically, you have a misunderstanding of how macros work and timing. You need to compile the macro list previous to the proc report, but you can't use call execute because that actually executes code. You need to create a macro variable.
Easiest way to do it is like so:
proc sql;
select cats('%create_institution(',ins,')')
into :inslist separated by ' '
from liste_institution
;
quit;
which makes &inslist which is now the list of institutions (with the macro call).
You also may be able to use across variables to allow this to be easier; what you'd have is one row per ins, with a single variable with that value (which defines the column name) and another single variable with the value that goes in the data table portion. Then SAS will automatically create columns for each across value. Across variables are one of the things that makes proc report extremely powerful.
Lets suppose we have the following dataset:
ID Stress_Level Heart_Rate
1 5 10
2 7 12
3 9 16
And the code one would use to rename a variable would be:
data test1;
set test0;
rename Stress_Level=A Heart_Rate=B;
run;
However, what I would like to do is to rename the 2 columns without using their names. Is there an "internal" SAS command that addresses the variable depending on which column it is? So for instance Stress_Level which is the 2nd column could be addressed as "COL2 " or something similar. Thus the code would be:
data test1;
set test0;
rename COL2=A COL3=B;
run;
Where "COL2" would always refer to the second column in the dataset regardless of its name. Is there a direct or maybe an indirect way to achieve that?
I think the easiest way is to build up a rename statement string from the metadata table DICTIONARY.COLUMNS (the view of this is SASHELP.VCOLUMN). This holds the column names and position for all tables in active libraries.
I've taken advantage of the ASCII sequence (the byte function) to rename the columns A, B etc, obviously you'd run into problems if there are more than 26 columns to be renamed in the table!
You'll also need to tweak the varnum+63 calculation if you wanted to start from a different column than 2.
proc sql noprint;
select cats(name,"=",byte(varnum+63)) into :newvars separated by ' '
from dictionary.columns
where libname = 'WORK' and memname='HAVE' and varnum>=2;
quit;
data want;
set have;
rename &newvars.;
run;
/* or */
/*
proc datasets lib=work nolist nodetails;
modify have;
rename &newvars.;
quit;
*/
There are a couple of ways you can do this.
The shortest approach is probably to use an array. The only drawbacks are that you need to know the types of the variables in advance and the name of the first variable.
If they are all numeric as in your example the following could be used:
data test1;
set test0;
array vars[*] _numeric_;
A = vars[2];
B = vars[3];
keep ID A B;
run;
You can only have one type of variable in an array, so it's slightly more complicated if they are not all numeric or all character. Additionally you will need to know the name of the first variable and any other variables that you wish to keep if you don't want to have the duplicates of the second and third variables.
A more robust approach is to use information from a dictionary table and a macro variable to write your rename statement:
proc sql;
/* Write the individual rename assignments */
select strip(name) || " = " || substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", varnum - 1, 1)
/* Store them in a macro variable and separate them by spaces */
into :vars separated by " "
/* Use a sas dictionary table to find metadata about the dataset */
from sashelp.vcolumn
where
libname = "WORK" and
memname = "TEST0" and
2 <= varnum <= 3;
quit;
data test1;
set test0;
rename &vars.;
run;
SAS stores information about datasets in dictionary tables, which have views available in the sashelp library. Take a look in some of the sashelp.v* tables to see what kind of information is available. The proc sql colon is used to store values in a macro variable, which can then be used in the rename statement.
I'd recommend the second approach as it is considerably more flexible and less dependent on the exact structure of your data. It also expands better when you have more than a couple of variables to rename.
Finally, if you want to make the changes to a dataset in place you may want to take a look at using proc datasets (in combination with the dictionary table approach) to do the renaming, as this can change the variable names without having to read and write every line of data.
I'm just starting to learn SAS and wanted to see if anyone knew of a way to delete certain variables from a dataset if they contained a certain word. I'm working with a dataset that contains a huge amount of variables (100+) with the word 'Label' in them and am looking to drop these. Unfortunately the word label comes at the end of the variable name, so I can't do a simple drop label:; Obviously I could individually list all the variables to drop, but I just wanted to see if anyone out there knew of a simpler way to accomplish this task. Thanks for reading and for any help you have to offer up.
Using a the vcolumn table and proc sql to create a macro variable a macro variable:
proc sql noprint;
select trim(compress(name))
into :drop_vars separated by ' '
from sashelp.vcolumn
where libname = upcase('lib1')
and
memname = upcase('table1')
and
upcase(name) like '%LABEL%'
;
quit;
%put &drop_vars.;
data table2;
set table1;
drop &drop_vars.;
run;
the proc sql will create a list of all the variables from table1 in library 'lib1' containing label anywhere in the name and put it into the macro variable called drop_vars. (upcase is used to reduce possibility of case causing an issue)
The data step then uses the drop statement and the drop_vars variable to drop all variables in the list.
Note: Make sure you check the output of the %put statement to ensure you do not drop variables you want to keep
What you need to do is come up with a dataset that contains the variable names, then create a macro variable containing those you want to drop. There are three (or more) options for the first part:
dictionary.columns
sashelp.vcolumn
proc contents output to a dataset
All three give the same result - a dataset of variable names (and other things), which you can then query.
So for example, using PROC SQL's SELECT INTO functionality to create a macro variable:
proc sql;
select name into :droplist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
and name like '%eigh%';
quit;
(replace eigh with Label for your needs; % is wildcard here)
and then you have a macro variable &droplist, which you can then use in a drop statement.
data want;
set sashelp.class;
drop &droplist;
run;