How do I import a census table into SAS when the variable name has special characters in it? - sas

I am using SAS Enterprise Guide, importing American Community Survey tables from the census into a script to work with them. Here is an example of a raw census csv I'm importing into SAS Enterprise Guide:
within my data step, when I use the command
County=Geo.display-label;
I get this error:
In base SAS, I was using
County=Geo_display_label;
While that worked in base SAS, when I tried that in Enterprise Guide, I got this error:
What is a way to get the raw data's variable name Geo.display-label to read into SAS Enterprise Guide correctly?

To see the impact of the VALIDVARNAME option on the names that PROC IMPORT generates when the column headers are not valid SAS names lets make a little test CSV file.
filename csv temp ;
data _null_;
file csv ;
put 'GEO.id,GEO.id2,GEO.display-label';
put 'id1,id2,geography';
run;
If we run PROC IMPORT to convert that into a SAS datasets when VALIDVARNAME option is set to ANY then it will use the column headers exactly, including the illegal characters like period and hyphen. To reference the variables with those illegal characters we will need to use name literals.
options validvarname=any;
proc import datafile=csv replace out=test1 dbms=dlm;
delimiter=',';
run;
proc contents data=test1; run;
proc freq data=test1;
tables 'GEO.display-label'n ;
run;
But if we set the option to V7 instead then it will convert the illegal characters into underscores.
options validvarname=v7;
proc import datafile=csv replace out=test2 dbms=dlm;
delimiter=',';
run;
proc contents data=test2; run;
proc freq data=test2;
tables geo_display_label ;
run;

County = 'geo.display-label'n;
if you set OPTIONS VALIDVARNAME=V7; in EG you will get the same names as batch sas.

Related

SAS PROC IMPORT Multiple SAV Files- Force SPSS Value Labels to Create UNIQUE SAS Format Names

Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.
I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.
Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?
Here is my code as well as an example of the variable names, format names, etc.
proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace;
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace;
run;
Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV
DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV
Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.
I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?
Thank you.
The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.
proc import out=dataset1 replace
datafile="S:\folder\Dataset1.SAV" dbms=SAV
;
fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace
datafile="S:\folder\Dataset2.SAV" dbms=SAV
;
fmtlib=work.fmtcat2;
run;
You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).
So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.
%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);
proc import datafile=%sysfunc(quote(&file)) dbms=save
out=&memname replace
;
fmtlib=work.&memname ;
run;
proc format lib=work.&memname cntlout=formats;
run;
data formats ;
set formats end=eof;
by fmtname type notsorted;
oldname=fmtname;
fmtname=catx('_',"&memname",oldname);
run;
proc contents data=&memname noprint out=contents;
run;
proc sql noprint;
select distinct catx(' ',c.name,cats(f.fmtname,'.'))
into :fmtlist separated by ' '
from contents c inner join formats f
on c.format = f.oldname
;
quit;
proc datasets nolist lib=work;
modify &memname;
format &fmtlist ;
run;
quit;
proc format lib=work.formats cntlin=formats;
run;
%mend sav_import;
%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);

How to find out all Excel worksheets' names in SAS without using pcfiles or ExcelCS libnames?

I have been trying to import a large Excel file in SAS consisting 20 worksheets. I am using following macro for proc import
%macro excel_imp(outds, worksheet);
proc import
out=&outds
datafile= "Z:\temp\sample"
dbms=XLSX replace;
sheet="&worksheet";
getnames=yes;
run;
%mend excel_imp;
%excel_imp(Ds1,Worksheet1);
%excel_imp(Ds2,Worksheet2);
The above code is running fine, but I have to call the macro 20 times with separate worksheet names.
I would like an automated code to identify the worksheet names and then use the macro above. I don't have pcfiles/ExcelCS in my SAS EG, I am using 9.4
Appreciate any help! Thanks.
Since XLSX clearly works, why not use the XLSX libname.
libname demo xlsx 'path to xlsx file';
proc copy in=demo out=work;
run;

Jupyter notebook display SAS output word-wrapper

I have a table in sas format (.sas7bdat) and would like to output it in Jupyter notebook.
proc print data=dataBoxE.my_data (firstobs=2 obs=12);
run;
The output table is jammed together since it has 100+ columns. How should I setup the environment within my notebook?
Moreover, is there a way to save the log file instead of opening it right away in the output cell? Thanks.
In SAS you can change the location of where the log file is created using proc printto; Documentation here.
When using proc printto, don't forget to reset the location to the default system value at the end of your, Example:
proc printto log='c:\em\log1.log';
run;
/* Your code here */
proc printto;
run;
If you don't need the 100+ columns; then select only the ones you want using the VAR statement in proc print Documentation here :
proc print data=exprev;
var country price sale_type;
run;
If you want all the 100+; just export them to csv using proc export and view them in any spreadsheet reader to avoid crashing your browser. Documentation here.
proc export data=sashelp.class
outfile='c:\myfiles\Femalelist.csv'
dbms=csv
replace;
run;

Exporting SAS data into SPSS with value labels

I have a simple data table in SAS, where I have the results from a survey I sent to my friends:
DATA Questionnaire;
INPUT make $ Question_Score ;
CARDS;
Ned 1
Shadowmoon 2
Heisenberg 1
Athelstan 4
Arnold 5
;
RUN;
What I want to do, using SAS, is to export this table into SPSS (.sav), and also have the value labels for the Question_Score, like shown in the picture below:
I then proceed to create a format in SAS (in hope this would do it):
PROC FORMAT;
VALUE Question_Score_frmt
1="Totally Agree"
2="Agree"
3="Neutral"
4="Disagree"
5="Totally Disagree"
;
run;
PROC FREQ DATA=Questionnaire;
FORMAT Question_Score Question_Score_frmt.
;
TABLES Question_Score;
RUN;
and finally export the table to a .sav file using the fmtlib option:
proc export data=Questionnaire outfile="D:\Questionnaire.sav"
dbms=spss replace;
fmtlib=work.Q1frmt;
quit;
Only to disappoint myself seeing that it didn't work.
Any ideas on how to do this?
You didn't apply the format to the dataset, unfortunately, you applied it to the proc freq. You would need to use PROC DATASETS or a data step to apply it to the dataset.
proc datasets lib=work;
modify questionnaire;
format Question_Score Question_Score_frmt.;
run;
quit;
Then exporting will include the format, if it's compatible in SAS's opinion with SPSS's value label rules. I will note that SAS's understanding of SPSS's rules is quite old, based on I think SPSS version 9, and so it's fairly often that it won't work still, unfortunately.

output data to excel with sheet name contains space

data _null_;
call symputx('ts','a b');
run;
proc export data=have
outfile='path\file.xlsx';
sheet="&ts.";
run;
But this will create a sheet named a_b(the original space is replaced by _.
How could this happen?
That's related to how things work in SAS's proc export. What it's doing behind the scenes is creating a libname, and then creating a dataset. Under normal (validmemname=compat) rules, you may not have spaces in dataset names. There is an option (validmemname=extend) to tell SAS to allow spaces (Which you then use a named literal to access, namely, "a b"n (the n tells SAS it's a name), but it seems proc export (and libname itself) doesn't listen to that.
However, in the present day, there is a workaround for this: You can use dbms=xlsx in the export if you are on SAS 9.4 TS1M1 or later. This uses a different engine than the default excel (which uses Microsoft's JET engine), and it permits spaces easily.
Just use the DBMS=XLSX option and you can include spaces in your sheet names.
proc export data=sashelp.class
outfile='class.xlsx'
dbms=xlsx
;
sheet="A B";
run;