Proc Import from SPSS - sas

Relatively new to SAS and I'm using proc import on an SPSS (.sav) data file and it runs fine but I noticed that it brings in only the SPSS value labels rather than the numeric equivalent. As an example in the Gender column 1='male', 2='female' and in the SAS data set 'male' and 'female' show up rather than 1 or 2.
Any insight would be appreciated. Current code...
proc import datafile = "C:\Data\workload_20130314.sav"
out=library.workload_20130314
dbms = sav
replace;
run;

You probably have the underlying values there, they're probably just formatted. Try opening the dataset and viewing the column properties of one of the columns you're looking at; it probably has a format that's like Q49F. or something that does that. It still works with PROC MEANS or whatever as a numeric variable.
You can run, I think, something like
proc datasets;
modify my_dataset;
format _all_;
quit;
to remove the overlay. You can also do that on a case by case basis.

Related

SAS PROC IMPORT Multiple SAV Files- Force SPSS Value Labels to Create UNIQUE SAS Format Names

Sometimes if I import multiple SAV files into the SAS work library, one variable imported later on overwrites the display text (i.e., the format) of an earlier imported variable with a similar name.
I've determined that this is because the later dataset's variable produces a format name for the custom format (from SPSS Values Labels) that is identical to format name from the earlier variable, even though the variables have different definitions in the Value Labels attributes in the SAV files.
Is there a way to force SAS to not re-use the same format names by automatically checking at PROC IMPORT whether a format name already exists in the work library format library before auto-naming a new custom format? Or is there any other way of preventing this from happening?
Here is my code as well as an example of the variable names, format names, etc.
proc import out=Dataset1 datafile="S:\folder\Dataset1.SAV"
dbms=SAV replace;
run;
proc import out=DatasetA datafile="S:\folder\DatasetA.SAV"
dbms=SAV replace;
run;
Dataset1 contains variable Question_1. The original SPSS Values Labels are 1=Yes 2=No. When this dataset is imported, SAS automatically generates the Format Name QUESTION., for Question_1. When only Dataset1 is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_1 in Dataset1.SAV
DatasetA contains variable Question_A with SPSS Value Labels 1=Agree 2=Unsure 3=Disagree. When this dataset is imported after Dataset1, SAS automatically generates the Format Name QUESTION. for Question_A, even though the work library already contains a format named QUESTION.. Therefore, this overwrites the definition of format QUESTION. that was generated when Dataset1 was imported. Once DatasetA is imported, the definition of format QUESTION. corresponds to the SPSS Value Labels for Question_A in DatasetA.SAV
Therefore, when Dataset1 and DatasetA are both imported, Variable Question_1 and Question_A both have the format name QUESTION assigned to them - And the definition of the format QUESTION. in the SAS work folder corresponds to the SPSS Value Labels in DatasetA.SAV, not Dataset1.SAV. Therefore, Question_1 will display as 1=Agree 2=Unsure, even though the variable values actually mean 1=Yes 2=No.
I would ideally like for these two variables to produce distinct custom format names at their import step, automatically. Is there any way to make this happen? Alternatively, is there any other way that prevent this type of overwriting from occurring?
Thank you.
The way to prevent literal overwriting is to point to a different format catalog for each SPSS file that is being read using the FMTLIB= optional statement.
proc import out=dataset1 replace
datafile="S:\folder\Dataset1.SAV" dbms=SAV
;
fmtlib=work.fmtcat1;
run;
proc import out=dataset2 replace
datafile="S:\folder\Dataset2.SAV" dbms=SAV
;
fmtlib=work.fmtcat2;
run;
You can then work later to rename the conflicting formats (and change the attached format in the dataset to use the new name).
So if the member name and format name are short enough you should be able to generate a unique new name by appending the two (add something in between to avoid conflict). So something like this will rename the formats, change the format name attached to the variables and rebuild the formats into the WORK.FORMATS catalog.
%macro sav_import(file,memname);
%if 0=%length(&memname) %then %let memname=%scan(&file,-2,\./);
proc import datafile=%sysfunc(quote(&file)) dbms=save
out=&memname replace
;
fmtlib=work.&memname ;
run;
proc format lib=work.&memname cntlout=formats;
run;
data formats ;
set formats end=eof;
by fmtname type notsorted;
oldname=fmtname;
fmtname=catx('_',"&memname",oldname);
run;
proc contents data=&memname noprint out=contents;
run;
proc sql noprint;
select distinct catx(' ',c.name,cats(f.fmtname,'.'))
into :fmtlist separated by ' '
from contents c inner join formats f
on c.format = f.oldname
;
quit;
proc datasets nolist lib=work;
modify &memname;
format &fmtlist ;
run;
quit;
proc format lib=work.formats cntlin=formats;
run;
%mend sav_import;
%sav_import(S:\folder\Dataset1.SAV);
%sav_import(S:\folder\Dataset2.SAV);

SAS: how to check if a column, in an imported excel sheet, contains a string?

I have imported a dataset from a an excel sheet, and I want to delete some observations. Say, I have a variable which tells me if a student has passed or not (with strings "Passed" and "Failed"). I want to delete all the students which have failed from the dataset.
I do know that usually I would be able to do so with an if statement. However, I don't know how to access the temporary dataset. Do I have to open after importing it, and then check with an if statement?
This is how I have tried:
proc import datafile="C:\Users\User\Desktop\testresults.xlsx"
DBMS=XLSX;
if Status = "failed" then delete
run;
I know this won't work as the "if" condition only works when the data resides in PDV.
Is it possible to delete after importing instead of while importing?
Use a where clause on the output data set:
proc import file="my.xlsx"
out=work.myxlsx(where=(status^="failed"))
dbms=xlsx
replace;
run;
A where statement would modify the output dataset from PROC IMPORT, as DomPazz shows.
Alternately, you can use a data step.
proc import datafile="C:\Users\User\Desktop\testresults.xlsx" out=have DBMS=XLSX;
run;
data want;
set have;
if Status = "failed" then delete;
run;
That of course would work whether you did it immediately after importing (or in the same submit) or some time later.

Exporting SAS data into SPSS with value labels

I have a simple data table in SAS, where I have the results from a survey I sent to my friends:
DATA Questionnaire;
INPUT make $ Question_Score ;
CARDS;
Ned 1
Shadowmoon 2
Heisenberg 1
Athelstan 4
Arnold 5
;
RUN;
What I want to do, using SAS, is to export this table into SPSS (.sav), and also have the value labels for the Question_Score, like shown in the picture below:
I then proceed to create a format in SAS (in hope this would do it):
PROC FORMAT;
VALUE Question_Score_frmt
1="Totally Agree"
2="Agree"
3="Neutral"
4="Disagree"
5="Totally Disagree"
;
run;
PROC FREQ DATA=Questionnaire;
FORMAT Question_Score Question_Score_frmt.
;
TABLES Question_Score;
RUN;
and finally export the table to a .sav file using the fmtlib option:
proc export data=Questionnaire outfile="D:\Questionnaire.sav"
dbms=spss replace;
fmtlib=work.Q1frmt;
quit;
Only to disappoint myself seeing that it didn't work.
Any ideas on how to do this?
You didn't apply the format to the dataset, unfortunately, you applied it to the proc freq. You would need to use PROC DATASETS or a data step to apply it to the dataset.
proc datasets lib=work;
modify questionnaire;
format Question_Score Question_Score_frmt.;
run;
quit;
Then exporting will include the format, if it's compatible in SAS's opinion with SPSS's value label rules. I will note that SAS's understanding of SPSS's rules is quite old, based on I think SPSS version 9, and so it's fairly often that it won't work still, unfortunately.

How do I output a SAS data set looking exactly like my result in PROC TABULATE?

So I am a complete beginner in SAS and it seems that I am missing something that is very obvious since I cannot figure this out. Hopefully someone could help me on this.
I have disorganized data in a .csv file with which I need to compute some stuff for, but the first step before any of that is to organize my data into a workable data set in SAS. So first, I run a DATA step to import my .csv file. Then, I run a PROC TABULATE to make it look exactly how I want it to so that I can compute additional variables as follows:
PROC TABULATE DATA = Work.Temp OUT = Work.Final;
However, the outputted data set Work.Final looks completely different from what I was able to create in PROC TABULATE. Basically, I was able to get the data into the form I want using PROC TABULATE, and I want my outputted SAS data set to look exactly in this form. Instead, the data set Work.Final is again a disorganized mess.
Any thoughts?
Try using ODS output to write a CSV file from your Proc tabulate.
ODS CSV FILE=”C:\Final.CSV”;
PROC Tabulate data=work.temp;
class bla bla bla;
table etc etc;
RUN;
ODS CSV CLOSE;

IGNORE DATA IN SAS IMPORT FROM EXCEL

I have no working knowledge of SAS, but I have an excel file that I need to import and work with. In the excel file there are about 100 rows (observations) and 7 columns (quantities). In some cases, a particular observation may not have any data in one column. I need to completely ignore that observation when reading my data into SAS. I'm wondering what the commands for this would be.
An obvious cheap solution would be to delete the rows in the excel file with missing data, but I want to do this with SAS commands, because I want to learn some SAS.
Thanks!
Import the data however you want, for example with the IMPORT procedure, as Stig Eide mentioned.
proc import
datafile = 'C:\...\file.xlsx'
dbms = xlsx
out = xldata
replace;
mixed = YES;
getnames = YES;
run;
Explanation:
The DBMS= option specifies how SAS will try to read the data. If your file is an Excel 2007+ file, i.e. xlsx, then you can use DBMS=XLSX as shown here. If your file is older, e.g. xls rather than xlsx, try DBMS=EXCEL.
The OUT= option names the output dataset.
If a single level name is specified, the dataset is written to the WORK library. That's the temporary library that's unique to each SAS session. It gets deleted when the session ends.
To create a permanent dataset, specify a two level name, like mylib.xldata, where mylib refers to a SAS library reference (libref) created with a LIBNAME statement.
REPLACE replaces the dataset created the first time you run this step.
MIXED=YES tells SAS that the data may be of mixed types.
GETNAMES=YES will name your SAS dataset variables based on the column names in Excel.
If I understand you correctly, you want to remove every observation in the dataset that has a missing value in any of the seven columns. There are fancier ways to do this, but I recommend a simple approach like this:
data xldata;
set xldata;
where cmiss(col1, col2, ..., col7) = 0;
run;
The CMISS function counts the number of missing values in the variables you specify at each observation, regardless of the data type. Since we're using WHERE CMISS()=0, the resulting dataset will contain only the records with no missing data for any of the seven columns.
When in doubt, try browsing the SAS online documentation. It's very thorough.
If you have "SAS/ACCESS Interface to PC Files" licensed (hint: proc setinit) you can import the Excel file with this code. The where option lets you select which rows you want to keep, in this example you will keep the rows where the column "name" is not blank:
proc import
DATAFILE="your file.xlsx"
DBMS=XLSX
OUT=resulttabel(where=(name ne ""))
REPLACE;
MIXED=YES;
QUIT;