Use DATA step vs PROC IMPORT - sas

In SAS, while creating a SAS data set from a raw data file (csv), we can either use the DATA step with the infile keyword or the PROC IMPORT step.
What are the advantages and disadvantages of each over the other?

Proc Import makes assumptions about the lengths of character variables and types of variables based on reading a number of rows in the CSV which is controlled by an option. If you issue a recall command in interactive sas after running proc import you get the data step code that proc import generated to do the actual work. It generates format and informat statements that may or may not be exactly what you want.
I often use proc import as a data step code generator, recall the code, and then modify it to suit what I want.
You can also add other processing logic to extend the functionality of the step beyond simply reading the source data into a data set. Creating new variables as transformations of one or more of the columns in the CSV springs to mind.

I generally agree this is too broad a question. That said:
PROC IMPORT is slower than a DATA STEP. This is because PROC IMPORT looks at the file and then writes and executes a DATA STEP.
A DATA STEP requires you to know the name, position, and attributes (type, length, etc) for each variable.
If I need to read a file once, I just use PROC IMPORT.
If I need to read a file multiple times, I don't care about speed, and the file format might change, then I use PROC IMPORT.
If I am in a production system where speed matters and I want an ERROR if the format changes, then I use PROC IMPORT. But I take the DATA STEP it writes for me and put that into my code.
If PROC IMPORT fails to guess my columns correctly, I use PROC IMPORT, modify the DATA STEP it produces, and then use that.

Related

Sas program optimization to use less workspace

I had one main program in sas, in that another 2 sas programs are being called.
These 2 sas programs create formats using proc format cntlin from large data sets and are temporary means residing in workspace. These formats are used in sas program to assing format to some variables.
In main sas program almost 15 large data sets are created in work library.
Some proc sql joins and data step merges are happening
We have index creation on data sets using proc datasets.
We also used proc sort
Where ever possible used where instead of if
It had mprint mlogic symbolgen options enabled
And some small logic wise performance tuning is done.
Here most part of dataset creation is done in work library. If we clear total work space previously created formats are lost. We dont want to loose formats untill end of job because these are used in entire sas program.
It is taking 1TB of sas workspace to accomplish all this job. So i wanted to reduce this usage space.
Can you guys someone please suggest what are all optimizations we can do to use less space as well as memory.
Write the format catalogs to a different folder.

Difference between INFILE and PROC IMPORT

What is the difference between INFILE statement and PROC IMPORT statement and which one is better to use?
PROC IMPORT is a procedure. Like PROC MEANS or PROC PRINT.
INFILE is a data step statement. Like INPUT, PUT, IF, etc.
You can use PROC IMPORT to convert data in various forms into SAS datasets. It can read from database files, spreadsheets and text files. It is really only for text files that there is any comparison with running your own DATA step (with an INFILE statement as one of the many statements in that step) that makes sense in this case.
If you use PROC IMPORT to read a delimited file then the procedure will make decisions about how to interpret the file. It will guess how many variables there are and what type to use for them. It will guess if any informats need to be used to properly read the text into data. It will convert the header line to variable names. Depending on the data it can sometimes work well and sometimes do really dumb things.
If you write your own DATA step then you have full control over what variables are created and how to read them. This might take a little more code or effort than letting a procedure guess for you. But if you have to read many similar files then using a data step will let you have the control needed to make sure that the individual datasets are created consistently.
as Tom speaked about difference between PROC Import and infile. Also infile supports libref but proc import doesnot. The proc import doesnot support statements for naming the variables name similar to data set. The names length types and other attributes are named automatically.

how to overcome missing values when importing date from excel to sas 9.4

I am trying to import an excel sheet into sas9.4. Interestingly, I can import most of the data without any problem. But for the date, there are lots of random missing values (there is no missing date in my excel file). Can anyone tell me how to improve my code please.
proc import out= sheet
datafile = 'D:\Date.xlsx'
dbms = excelcs replace;
sheet = "abc" ;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=NO;
run;
all date looks like this:21/06/2010, 22/06/2010.
Change your DBMS to XLSX and USEDATE to No. Then you'll import the field as a text field.
You can then use an input() function to create a new date variable.
Not ideal, but easily accomplished.
More than likely, your problem is that the automatic conversion is considering those mm/dd/yyyy, but of course they are actually dd/mm/yyyy.
One possible solution is to use the SASDATEFMT option, documented here:
proc import file="myfile.xlsx" out=dataset dbms=excel replace;
dbdsopts="sasdatefmt=(varname=DDMMYY10.)";
run;
That sets the SAS format, but is also alleged by the documentation to affect the informat used to convert it.
It's also possible, though, that your data is actually mixed character/numeric (as it would be if they were entered by hand into excel, in an excel that was expecting mm/dd/yy, and instead were dd/mm/yy). In that case, the simplest answer is to either change your registry to tell Microsoft to scan the whole column (see this SAS tech support note for example ), or to simply convert all of the values to character (or at least the first couple), and then add a mixed=yes; line to your proc import statement.
(The registry setting may not have an effect if you're using PC Files Server, which you may be given the excelcs dbms above. In that case, ignore that particular suggestion.)

IGNORE DATA IN SAS IMPORT FROM EXCEL

I have no working knowledge of SAS, but I have an excel file that I need to import and work with. In the excel file there are about 100 rows (observations) and 7 columns (quantities). In some cases, a particular observation may not have any data in one column. I need to completely ignore that observation when reading my data into SAS. I'm wondering what the commands for this would be.
An obvious cheap solution would be to delete the rows in the excel file with missing data, but I want to do this with SAS commands, because I want to learn some SAS.
Thanks!
Import the data however you want, for example with the IMPORT procedure, as Stig Eide mentioned.
proc import
datafile = 'C:\...\file.xlsx'
dbms = xlsx
out = xldata
replace;
mixed = YES;
getnames = YES;
run;
Explanation:
The DBMS= option specifies how SAS will try to read the data. If your file is an Excel 2007+ file, i.e. xlsx, then you can use DBMS=XLSX as shown here. If your file is older, e.g. xls rather than xlsx, try DBMS=EXCEL.
The OUT= option names the output dataset.
If a single level name is specified, the dataset is written to the WORK library. That's the temporary library that's unique to each SAS session. It gets deleted when the session ends.
To create a permanent dataset, specify a two level name, like mylib.xldata, where mylib refers to a SAS library reference (libref) created with a LIBNAME statement.
REPLACE replaces the dataset created the first time you run this step.
MIXED=YES tells SAS that the data may be of mixed types.
GETNAMES=YES will name your SAS dataset variables based on the column names in Excel.
If I understand you correctly, you want to remove every observation in the dataset that has a missing value in any of the seven columns. There are fancier ways to do this, but I recommend a simple approach like this:
data xldata;
set xldata;
where cmiss(col1, col2, ..., col7) = 0;
run;
The CMISS function counts the number of missing values in the variables you specify at each observation, regardless of the data type. Since we're using WHERE CMISS()=0, the resulting dataset will contain only the records with no missing data for any of the seven columns.
When in doubt, try browsing the SAS online documentation. It's very thorough.
If you have "SAS/ACCESS Interface to PC Files" licensed (hint: proc setinit) you can import the Excel file with this code. The where option lets you select which rows you want to keep, in this example you will keep the rows where the column "name" is not blank:
proc import
DATAFILE="your file.xlsx"
DBMS=XLSX
OUT=resulttabel(where=(name ne ""))
REPLACE;
MIXED=YES;
QUIT;

Export SAS data to SPSS, date and datetime

I have data inside SAS.
I want to store the datafile to SPSS format (*.sav)
I use the following program:
PROC export Data=SASdataToStoreInSPSS
FILE="Path\Filename_%sysfunc(today(),date9.).sav"
dbms=sav replace;
RUN;
This works great. Except when I open the file in SPSS the dates are strangly formatted.
For example:
156405 08:51:00
Should be
3-Jan-2011 08:51
I can manually change the data formats in SPSS. So the values are correct date values, except they are not automatically formatted in a readable format.
I tried to change the format in SAS before saving to DATETIME20. or DATETIME23.3. But this does not help.
I want this to work without having to open SPSS and run a Syntax there.
The SPSS files that SAS spits out have to be directly mailed to other users of the data.
I think this is either a bug with SAS's export, or an issue with SPSS where some default changed. What's happening is that SAS is storing it as a SPSS Date - but with width 16, which is not long enough to hold the complete datetime. I don't think you can use DBDSOPTS with DBMS=SPSS, so I don't know that there is a good workaround short of importing the file into SPSS.
You could do that automatically, though, using the SPSS Production facility; I've written an import script before and asked SAS to run spssprod with the batch file. That's an irritating workaround, but it might be the easiest, unless SAS Tech Support can help you (and certainly try that - they are usually only a few hours' turnaround for initial contact at least).
SAS mentioned it has to do with the SPSS driver they use. Apparently it is not an easy fix so they forwarded the issue to second-line tech support.
The workaround you will need is split the dates in two columns. One with date and one with time.
data SPSS2;
set SPSS;
date = put(datepart(DatumSPSS), date9.);
time = put(timepart(DatumSPSS), time8.);
run;
Or you can tell the end user how to change the format of the date in SPSS.
For an automated approach, try this .NET app. You need SPSS, but SAS is not required to convert a large collection of SAS files automatically.
Manual Process included code samples or Application Download