What is the difference between INFILE statement and PROC IMPORT statement and which one is better to use?
PROC IMPORT is a procedure. Like PROC MEANS or PROC PRINT.
INFILE is a data step statement. Like INPUT, PUT, IF, etc.
You can use PROC IMPORT to convert data in various forms into SAS datasets. It can read from database files, spreadsheets and text files. It is really only for text files that there is any comparison with running your own DATA step (with an INFILE statement as one of the many statements in that step) that makes sense in this case.
If you use PROC IMPORT to read a delimited file then the procedure will make decisions about how to interpret the file. It will guess how many variables there are and what type to use for them. It will guess if any informats need to be used to properly read the text into data. It will convert the header line to variable names. Depending on the data it can sometimes work well and sometimes do really dumb things.
If you write your own DATA step then you have full control over what variables are created and how to read them. This might take a little more code or effort than letting a procedure guess for you. But if you have to read many similar files then using a data step will let you have the control needed to make sure that the individual datasets are created consistently.
as Tom speaked about difference between PROC Import and infile. Also infile supports libref but proc import doesnot. The proc import doesnot support statements for naming the variables name similar to data set. The names length types and other attributes are named automatically.
Related
In SAS, while creating a SAS data set from a raw data file (csv), we can either use the DATA step with the infile keyword or the PROC IMPORT step.
What are the advantages and disadvantages of each over the other?
Proc Import makes assumptions about the lengths of character variables and types of variables based on reading a number of rows in the CSV which is controlled by an option. If you issue a recall command in interactive sas after running proc import you get the data step code that proc import generated to do the actual work. It generates format and informat statements that may or may not be exactly what you want.
I often use proc import as a data step code generator, recall the code, and then modify it to suit what I want.
You can also add other processing logic to extend the functionality of the step beyond simply reading the source data into a data set. Creating new variables as transformations of one or more of the columns in the CSV springs to mind.
I generally agree this is too broad a question. That said:
PROC IMPORT is slower than a DATA STEP. This is because PROC IMPORT looks at the file and then writes and executes a DATA STEP.
A DATA STEP requires you to know the name, position, and attributes (type, length, etc) for each variable.
If I need to read a file once, I just use PROC IMPORT.
If I need to read a file multiple times, I don't care about speed, and the file format might change, then I use PROC IMPORT.
If I am in a production system where speed matters and I want an ERROR if the format changes, then I use PROC IMPORT. But I take the DATA STEP it writes for me and put that into my code.
If PROC IMPORT fails to guess my columns correctly, I use PROC IMPORT, modify the DATA STEP it produces, and then use that.
I am trying to import an excel sheet into sas9.4. Interestingly, I can import most of the data without any problem. But for the date, there are lots of random missing values (there is no missing date in my excel file). Can anyone tell me how to improve my code please.
proc import out= sheet
datafile = 'D:\Date.xlsx'
dbms = excelcs replace;
sheet = "abc" ;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=NO;
run;
all date looks like this:21/06/2010, 22/06/2010.
Change your DBMS to XLSX and USEDATE to No. Then you'll import the field as a text field.
You can then use an input() function to create a new date variable.
Not ideal, but easily accomplished.
More than likely, your problem is that the automatic conversion is considering those mm/dd/yyyy, but of course they are actually dd/mm/yyyy.
One possible solution is to use the SASDATEFMT option, documented here:
proc import file="myfile.xlsx" out=dataset dbms=excel replace;
dbdsopts="sasdatefmt=(varname=DDMMYY10.)";
run;
That sets the SAS format, but is also alleged by the documentation to affect the informat used to convert it.
It's also possible, though, that your data is actually mixed character/numeric (as it would be if they were entered by hand into excel, in an excel that was expecting mm/dd/yy, and instead were dd/mm/yy). In that case, the simplest answer is to either change your registry to tell Microsoft to scan the whole column (see this SAS tech support note for example ), or to simply convert all of the values to character (or at least the first couple), and then add a mixed=yes; line to your proc import statement.
(The registry setting may not have an effect if you're using PC Files Server, which you may be given the excelcs dbms above. In that case, ignore that particular suggestion.)
I have no working knowledge of SAS, but I have an excel file that I need to import and work with. In the excel file there are about 100 rows (observations) and 7 columns (quantities). In some cases, a particular observation may not have any data in one column. I need to completely ignore that observation when reading my data into SAS. I'm wondering what the commands for this would be.
An obvious cheap solution would be to delete the rows in the excel file with missing data, but I want to do this with SAS commands, because I want to learn some SAS.
Thanks!
Import the data however you want, for example with the IMPORT procedure, as Stig Eide mentioned.
proc import
datafile = 'C:\...\file.xlsx'
dbms = xlsx
out = xldata
replace;
mixed = YES;
getnames = YES;
run;
Explanation:
The DBMS= option specifies how SAS will try to read the data. If your file is an Excel 2007+ file, i.e. xlsx, then you can use DBMS=XLSX as shown here. If your file is older, e.g. xls rather than xlsx, try DBMS=EXCEL.
The OUT= option names the output dataset.
If a single level name is specified, the dataset is written to the WORK library. That's the temporary library that's unique to each SAS session. It gets deleted when the session ends.
To create a permanent dataset, specify a two level name, like mylib.xldata, where mylib refers to a SAS library reference (libref) created with a LIBNAME statement.
REPLACE replaces the dataset created the first time you run this step.
MIXED=YES tells SAS that the data may be of mixed types.
GETNAMES=YES will name your SAS dataset variables based on the column names in Excel.
If I understand you correctly, you want to remove every observation in the dataset that has a missing value in any of the seven columns. There are fancier ways to do this, but I recommend a simple approach like this:
data xldata;
set xldata;
where cmiss(col1, col2, ..., col7) = 0;
run;
The CMISS function counts the number of missing values in the variables you specify at each observation, regardless of the data type. Since we're using WHERE CMISS()=0, the resulting dataset will contain only the records with no missing data for any of the seven columns.
When in doubt, try browsing the SAS online documentation. It's very thorough.
If you have "SAS/ACCESS Interface to PC Files" licensed (hint: proc setinit) you can import the Excel file with this code. The where option lets you select which rows you want to keep, in this example you will keep the rows where the column "name" is not blank:
proc import
DATAFILE="your file.xlsx"
DBMS=XLSX
OUT=resulttabel(where=(name ne ""))
REPLACE;
MIXED=YES;
QUIT;
is it possible to read only specific variabiles from text files into sas as sas dataset using proc import? I have very big data in my text file which contains like 1000 observations and more than 42,000 variables. I tried reading this file into sas using proc import but i failed in doing so i thought may be because of size issues. Now i decided to read only specific variables (columns) really i do not need all of them from this big text file so that i can reduce size of file to read into sas system. is there any ideas to read like this by using data step? Suggestion or help would be appreciated
Thanking you very much,
If you're talking about a delimited text file, you can read in specific variables, contingent on them being consecutive from the first variable. You can't use PROC IMPORT to do this, however; you'd have to write the datastep yourself, though you could conceivably have PROC IMPORT help you write it.
For example, if you have 10 variables and only want the first 3, then you can read it in like this:
data want;
infile "mydata.txt" dlm=',' lrecl=255 missover dsd;
input x $ y $ z $;
run;
However, you must read in all variables from the first variable to the last variable you are interested in, even if you aren't interested in all of the intermediate variables. You can't "skip" them with a delimited text file.
If you have a fixed width text file, you can read in whatever columns you want (but you can't use PROC IMPORT to read in a fixed width text file).
I need to import data from European Social Survey databank to SAS.
I'm not very good at using SAS so I just naively tried importing the text file one gets but it stores it all in one variable.
Can someone maybe help me with what to do? Since there doesn't seem to be a guide on their webpage I reckon it has to be pretty easy.
It's free to register (and takes 5 secs) and I need all possible data for Denmark.
Edit: When downloading what they call a SAS file, what i get is a huge proc format and the same text file as one gets by choosing text.
The data in the text file isn't comma separated and the first row does not contain variable names.
Download it in SAS format. Save the text file in a location you can remember, and open the SAS file. It's not just one big proc format; it's a big proc format followed by a datastep with input code. It was probably created by SPSS (it fits the pattern of an SPSS saved .sas file anyhow). Look for:
DATA OUT.ESS1_4e01_0_F1;
Or something like that (that's what it is when I downloaded it). It's probably about 3/4 of the way down the page. You just need to change the code:
INFILE 'ESS1_4e01_0_F1.txt';
or similar, to be the directory you placed the text file in. Create a LIBNAME for OUT that goes to wherever you want to permanently save this, and do that at the start of the .sas file, replacing the top 3 lines like so.
Originally:
LIBNAME LIBRARY '';
LIBNAME OUT '';
PROC FORMAT LIBRARY=LIBRARY ;
Change these to:
libname out "c:\mystuff\"; *but probably not c:\mystuff :);
options fmtsearch=(out);
proc format lib=out;
Then run the entire thing.
This is the best solution if you want the formatted values (value labels) and variable labels. If you don't care about that, then it might be easier to deal with the CSV like Bob shows.
But the website says yu can download SAS format, why don't you?
You need a delimiter if all goes into one column.
data temp;
length ...;
infile 'file.csv' dlm=',';
input ...;
run;
As Dirk says, the web site says you can download a SAS dataset directly. However, if there's some reason you don't want to do that, choose a comma separated file (CSV) and use PROC IMPORT. Here is an example:
proc import out=some_data
datafile='c:\path\somedata.csv'
dbms=csv replace;
getnames=yes;
run;
Of course, this assumes the first row contains column names.