sas7bdat with spaces in var names - sas

I've received several SAS dataset files with the .sas7bdat extension. I'm using SAS 9.3 on windows, and the creator of these files was evidently using a different environment and/or software. Many of the files have var names that include spaces and other invalid characters. Even running a proc contents raises an error like this:
ERROR: The value Person ID is not a valid SAS name.
Oddly, SAS Enterprise Guide opens and displays the file without complaining.
How do I efficiently re-name all my bad var names so that I can start running actual programs with these files?

In addition to option validvarname=ANY; you also need to reference the variable names as name literals: 'Person ID'n (either single or double quotes are fine).
If you want to efficiently rename all of them, you can do something like this:
options validvarname=any;
data have;
'Hello Var'n = 1;
'Another Var'n = 2;
x=3;
run;
data badvarnames;
set sashelp.vcolumn;
where libname='WORK' and memname='HAVE' and not nvalid(name,'v7');
validname = translate(trim(name),'_',' ');
name = cats("'",name,"'n");
run;
proc sql;
select catx(' ','rename',name,'=',validname,';') into :renamelist
separated by ' ' from badvarnames;
quit;
proc datasets lib=work;
modify have;
&renamelist;
quit;
proc contents data=have;
run;
Depending on other details (such as the possibility that this might create overlapping variable names), you might need to adjust the code for those concerns.

I don't think it is a different OS/SAS version. I think that you may need to submit the following
options validvarname = ANY;

Related

Can I use wildcards in dataset names for PROC CONTENTS?

On the SAS server we have a library that contains thousands of datasets. I want to catalog the contents of a subset of these, all of which have names that begin with "prov". Can I use a wildcard to specify this?
I tried:
PROC CONTENTS DATA=library.prov*;
RUN;
But that just produces a log with this error message:
ERROR: File LIBRARY.PROV.DATA does not exist.
I also tried library.prov%, and that gave the same error.
There are over 100 datasets that start with "prov" so I really don't want to have to do them one at a time. Any ideas?
Depending on what information you want that the CONTENTS procedure produces you could just use the DICTIONARY metadata views.
proc sql ;
create table want as
select *
from dictionary.columns
where libname = 'LIBREF'
and memname like 'PROV%'
;
quit;
Use a WHERE data set option.
proc contents data=sashelp._all_ noprint out=class(where=(memname like 'CLASS%'));
run;
When you specify the keyword _ALL_ in the PROC CONTENTS statement, the step displays a list of all the SAS files that are in the specified SAS library.
Example :
PROC CONTENTS DATA=libref._ALL_ NODS;
RUN;
But to open only the datasets that begin with prov you can use the SQL and add CONTAINS to WHERE e.g:
proc sql ;
create table mytables as
select *
from dictionary.tables
where libname = 'WORK'
order by memname ;
quit ;
Now just run:
PROC CONTENTS DATA mytables;
RUN;
I may be using a different version of SAS check if you have the library SASHELP if so try this based on my note in your comment on the previous response you may see that this works out for you:
proc sql outobs=100;
create table see as
select distinct libname,memname,crdate,modate from sashelp.vtable
where libname='LIBRARY' and memname like 'PROV%'
order by memname;
quit;

Assigning Headers from one file to multiple data files

I have a list of ~100 files. The first file contains header information for the other 98 data files. The information should be in table format, however each table is a different size (with regards to column and row number).
My goal is to import these files such that the column headers from the first file are correctly assigned.
Additional information:
I am told this list of files was generated using SAS (however I am not familiar with the file format) Furthermore, the "CIMPORT" command does not work on these files.
The files are "|" delineated
Thank you very much for any help.
This was a fun issue. I came up with following way:
First lets load up some data.
proc import datafile = "\\Datadrive\mydata.csv"
out=w_headers;
delimiter=";";
guessingrows=32767;
run;
proc import datafile = "\\Datadrive\no_headers.csv"
out=no_headers;
delimiter=";";
guessingrows=32767;
run;
Then I extract the names of the columns and variable number to a dataset.
proc contents data=w_headers out=meta(keep=NAME VARNUM) noprint ; run ;
Then I create commands to renaming the columns without names to have proper names based on the existing. ones.
data meta;
set meta;
cmd = cats('VAR',VARNUM,'=', name);
run;
Here comes the kicker, I put the the commends to a variable. Next the variable is fed to proc datasets for renaming the columns.
proc sql noprint;
select cmd into :cmd_list separated by ' ' from meta;
quit;
proc datasets library = work nolist;
modify no_headers;
rename &cmd_list;
quit;
At this point my two datasets have identical column names. the method is a bit tricky, but works. I'm sure there is another way, but this was fun one. :)

How do I import a census table into SAS when the variable name has special characters in it?

I am using SAS Enterprise Guide, importing American Community Survey tables from the census into a script to work with them. Here is an example of a raw census csv I'm importing into SAS Enterprise Guide:
within my data step, when I use the command
County=Geo.display-label;
I get this error:
In base SAS, I was using
County=Geo_display_label;
While that worked in base SAS, when I tried that in Enterprise Guide, I got this error:
What is a way to get the raw data's variable name Geo.display-label to read into SAS Enterprise Guide correctly?
To see the impact of the VALIDVARNAME option on the names that PROC IMPORT generates when the column headers are not valid SAS names lets make a little test CSV file.
filename csv temp ;
data _null_;
file csv ;
put 'GEO.id,GEO.id2,GEO.display-label';
put 'id1,id2,geography';
run;
If we run PROC IMPORT to convert that into a SAS datasets when VALIDVARNAME option is set to ANY then it will use the column headers exactly, including the illegal characters like period and hyphen. To reference the variables with those illegal characters we will need to use name literals.
options validvarname=any;
proc import datafile=csv replace out=test1 dbms=dlm;
delimiter=',';
run;
proc contents data=test1; run;
proc freq data=test1;
tables 'GEO.display-label'n ;
run;
But if we set the option to V7 instead then it will convert the illegal characters into underscores.
options validvarname=v7;
proc import datafile=csv replace out=test2 dbms=dlm;
delimiter=',';
run;
proc contents data=test2; run;
proc freq data=test2;
tables geo_display_label ;
run;
County = 'geo.display-label'n;
if you set OPTIONS VALIDVARNAME=V7; in EG you will get the same names as batch sas.

Running All Variables Through a Function in SAS

I am new to SAS and need to sgplot 112 variables. The variable names are all very different and may change over time. How can I call each variable in the statement without having to list all of them?
Here is what I have done so far:
%macro graph(var);
proc sgplot data=monthly;
series x=date y=var;
title 'var';
run;
%mend;
%graph(gdp);
%graph(lbr);
The above code can be a pain since I have to list 112 %graph() lines and then change the names in the future as the variable names change.
Thanks for the help in advance.
List processing is the concept you need to deal with something like this. You can also use BY group processing or in the case of graphing Paneling in some cases to approach this issue.
Create a dataset from a source convenient to you that contains the list of variables. This could be an excel or text file, or it could be created from your data if there's a way to programmatically tell which variables you need.
Then you can use any of a number of methods to produce this:
proc sql;
select cats('%graph(',var,')')
into: graphlist separated by ' '
from yourdata;
quit;
&graphlist
For example.
In your case, you could also generate a vertical dataset with one row per variable, which might be easier to determine which variables are correct:
data citiwk;
set sashelp.citiwk;
var='COM';
val=WSPCA;
output;
var='UTI';
val=WSPUA;
output;
var='INDU';
val=WSPIA;
output;
val=WSPGLT;
var='GOV';
output;
keep val var date;
run;
proc sort data=citiwk;
by var date;
run;
proc sgplot data=citiwk;
by var;
series x=date y=val;
run;
While I hardcoded those four, you could easily create an array and use VNAME() to get the variable name or VLABEL() to get the variable label of each array element.

Deleting variable names containing specific string

I'm just starting to learn SAS and wanted to see if anyone knew of a way to delete certain variables from a dataset if they contained a certain word. I'm working with a dataset that contains a huge amount of variables (100+) with the word 'Label' in them and am looking to drop these. Unfortunately the word label comes at the end of the variable name, so I can't do a simple drop label:; Obviously I could individually list all the variables to drop, but I just wanted to see if anyone out there knew of a simpler way to accomplish this task. Thanks for reading and for any help you have to offer up.
Using a the vcolumn table and proc sql to create a macro variable a macro variable:
proc sql noprint;
select trim(compress(name))
into :drop_vars separated by ' '
from sashelp.vcolumn
where libname = upcase('lib1')
and
memname = upcase('table1')
and
upcase(name) like '%LABEL%'
;
quit;
%put &drop_vars.;
data table2;
set table1;
drop &drop_vars.;
run;
the proc sql will create a list of all the variables from table1 in library 'lib1' containing label anywhere in the name and put it into the macro variable called drop_vars. (upcase is used to reduce possibility of case causing an issue)
The data step then uses the drop statement and the drop_vars variable to drop all variables in the list.
Note: Make sure you check the output of the %put statement to ensure you do not drop variables you want to keep
What you need to do is come up with a dataset that contains the variable names, then create a macro variable containing those you want to drop. There are three (or more) options for the first part:
dictionary.columns
sashelp.vcolumn
proc contents output to a dataset
All three give the same result - a dataset of variable names (and other things), which you can then query.
So for example, using PROC SQL's SELECT INTO functionality to create a macro variable:
proc sql;
select name into :droplist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='CLASS'
and name like '%eigh%';
quit;
(replace eigh with Label for your needs; % is wildcard here)
and then you have a macro variable &droplist, which you can then use in a drop statement.
data want;
set sashelp.class;
drop &droplist;
run;