SAS PROC Import vs DATA step with INFILE - sas

My PROC IMPORT step is throwing "import unsuccessful" effor when I am trying to read a '~' delimited file containing address field. In the CSV file, 5 byte zip code is automatically treated as a numeric field and once in a while I am getting bad data records with invalid zip codes as VXR1#. When this is encountered I am getting "import unsuccessful" error and the SAS job is failing.
PROC IMPORT is automatically getting converted to DATA step with an infile. So I tried DATA step with INFILE option and with INFORMATS and FORMATS and changed the FORMAT of ZIP to alphanumeric. But I faced different issue now. With DATA, INFORMAT and FORMAT keywords, the lengths mismatch is happening and the data is getting moved to different locations automatically. Could someone help me to figure out a solution for this issue?
Included PROC IMPORT I used and DATA file step I used below for reference:
data WORK.TRADER_STATS ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile '/sascode/test/TRADER_STATS.csv' delimiter = '~' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat TRADER_id best32. ;
informat dealer_ids $60. ;
informat dealer_name $27. ;
informat dealer_city $15. ;
informat dealer_st $2. ;
informat dealer_zip $5. ;
informat SNO best32. ;
informat start_dt yymmdd10. ;
informat end_dt yymmdd10. ;
input
TRADER_id
dealer_ids $
dealer_name $
dealer_city $
dealer_st $
dealer_zip
sno
start_dt
end_dt
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
proc import file="/sascode/test/TRADER_STATS_BY_DAY.csv" out=WORK.TRADER_STATS_BY_DAY
dbms=dlm replace;
delimiter='~';
;run;

Try Using the : colon operator which will tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered, which will sort out your problem of - data getting moved to different locations automatically
data WORK.TRADER_STATS ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile '/sascode/test/TRADER_STATS.csv' delimiter = '~' MISSOVER DSD lrecl=32767 firstobs=2 ;
input TRADER_id : best32.
dealer_ids : $60.
dealer_name : $27.
dealer_city : $15.
dealer_st $ : $2.
dealer_zip : $5.
sno : best32.
start_dt : yymmdd10.
end_dt : yymmdd10.;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;

Related

How to generate new import code following a structure change in an upstream flat file without headers?

I have an existing process that imports data from a flat file with no headers. There are hundreds of columns. The provider of the file has added several hundred more columns at different points within the existing columns. I have a list of the old and new column names and SAS code that properly sets the data types for the old columns but not the new ones. I'd rather not have to go through my existing import code and manually write column headers and data formats but I'm not sure how to use these parts to get new import code for the new headers.
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
informat oldcol1 best32.;
informat oldcol2 mmddyy10.;
informat oldcolN $60.;
format oldcol1 best32.;
format oldcol2 mmddyy10.;
format oldcolN $60.;
input
oldcol1
oldcol2
oldcolN $;
run;
I have the header information in an Excel file right now.
old K010H K010I K010J K020A
new K010H K010I K010J K010L K010M K010N K020A
Based on your description, I presume you either know or will find out the informats for the new columns also. If that is the case, why don't you auto generate the code to read the file?
Since you have the header information, assuming you can modify it to the following format and save as a CSV:
var infmt
K010H best32.
K010I mmddyy10.
K010J $60.
K010L best32.
K010M mmddyy10.
K010N $60.
K020A best32.
Then something like this would automatically generate the code and read the data for you:
proc import datafile="cols.csv" out=cols replace;
run;
proc sql;
select var into :cols separated by ' ' from cols ;
select infmt into :infmts separated by ' ' from cols ;
quit;
%macro gen_code;
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) ~= %str());
%let col = %scan(&cols, &ii, %str( ));
%let infmt = %scan(&infmts, &ii, %str( ));
informat &col &infmt ;
%let ii = %eval(&ii + 1);
%end;
input
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) NE %str());
%let col = %scan(&cols, &ii, %str( ));
&col
%let ii = %eval(&ii + 1);
%end;
;
run;
%mend;
%gen_code;
In the future, you could make modifications to your header CSV file and the rest will be taken care by the code itself.
If you have a machine readable data dictionary then you can generate the code from that. Otherwise you will need to just edit your data step. While you are at it you can clean it up so that it is easier to maintain.
First thing is to use LENGTH or ATTRIB to define the variables, instead of forcing SAS to guess. Second only attach informats or formats to variables that need them. For example there is no need to attach informats to normal strings or numbers. No need to attach $xx format to character variables. Do you really need to attach BEST32. format to numbers instead of letting SAS go ahead and display the numeric variables without formats attached using the default BEST12. format?
Second if you define the variables in the order they appear then you can use a positional variable list in the INPUT statement. Then you only have to change the INPUT statement if the first or last variable changes.
So for your example you might create a data step like this instead.
data raw_file;
infile "flatfile.csv" dlm="|" truncover dsd firstobs=1;
length
oldcol1 8
oldcol2 8
oldcolN $60
;
informat oldcol2 mmddyy10.;
format oldcol2 mmddyy10.;
input oldcol1 -- oldcolN ;
run;
Then adding new variables is as simple as inserting them into right place in the LENGTH statement and when needed adding them to the INFORMAT and/or FORMAT statements. If you don't know what the variables contain then make them as character strings and look at the resulting values and decide later if you need to define them differently.

SAS: Using macro to import multiple text files

I am using SAS to import hundreds of csv files.
There are 300 city data-sets with the following naming convention (Shanghai001-Shanghai100, London001-London100, Newyork001-Newyork100).
My current import code is
data shanghai001;
infile 'H:\shanghai001.csv'
delimiter = ',' DSD lrecl=32767 firstobs=2;
informat Date_L_ DATE11.;
informat Time_L_ time18.3;
informat Type $10.;
format Date_L_ DATE11.;
format Time_L_ time18.3;
format Type $10.;
input
Date_L_
Time_L_
Type $
;
run;
This code works, but I just want to know how to use macro to import these 300 data sets?
Any smart guy can tell me ?
You could use a macro to do this, but you do not need to. It would probably be much easier to read all of the CSV files into a single data set. You could add variables like FNAME and VERSION to tell which observations came from which source file.
data all_data;
length fname $100 path $200 version 8 ;
length Date_L_ Time_L_ 8 Type $10.;
informat Date_L_ date11. Time_L_ time18.3 ;
format version z3. Date_L_ date11. Time_L_ time12.3 ;
do fname='shanghai','london','newyork';
do version=1 to 100 ;
path = catx('\','H:',cats(fname,put(version,z3.),'.csv'));
if not fileexist(path) then do;
put 'ERROR: File not found. ' path=:$quote.;
continue;
end;
infile csvfile filevar=path dsd truncover end=eof;
if not eof then input ;
do while (not eof);
input Date_L_ Time_L_ Type ;
output;
end;
end;
end;
run;

Importing in SAS using infile

filename Source 'C:\Source.txt';
Data Example;
Infile Source;
Input Var1 Var2;
Run;
Is there a way I can import all the variables from Source.txt without the "Input Var1 Var2" line? If there are many variables, I think it's too time consuming to list out all the variables, so I was wondering if there's any way to bypass that.
Thanks
Maybe you can use proc import ?
For a CSV I use this and I don't have to define every variable
proc import datafile="&CSVFILE"
out=myCsvData
dbms=dlm
replace;
delimiter=';';
getnames=yes;
run;
It depends on what you have in your txt file. Try different delimiters.
If you are looking at a solution which is INFILE statement based then following reference code should help.
data _null_;
set sashelp.class;
file '/tester/sashelp_class.txt' dsd dlm='09'x;
put name age sex weight height;
run;
/* Version #1 : When data has mixed data(numeric and character) */
data reading_data_w_format;
infile '/tester/sashelp_class.txt' dsd dlm='09'x;
format name $10. age 8. gender $1. weight height 8.2;
input (name--height) (:);
run;
proc print data=reading_data_w_format;run;
proc contents data=reading_data_w_format;run;
/* Version #2 : When all data can be read a character.
I know this version doesn't make sense, but it's still an option*/
data reading_data_wo_format;
infile '/tester/sashelp_class.txt' dsd dlm='09'x;
input (var1-var5) (:$8.); /* Length would be max length of value in all the columns */
run;
proc print data=reading_data_wo_format;run;
proc contents data=reading_data_wo_format;run;
I'd suggest to write down the informat for the variables to be read so that you are sure that the file is as per your specification. PROC IMPORT will try to scan the data first from 1st row till GUESSINGROWS(do not set it to high, if each column is of consistent length) value and based on the length and type, it will use an informat and length which it finds suitable for the reading the variables in the file.

SAS: Taking Date data in DD-MMM-YYYY format from a csv file in a date format in a permanent data set

I would like to import data from a csv file in a permanent data set which has this date column with data format like "dd-mmm-yyyy" like "22-FEB-1990". I want this to be imported as date format inside the data set too. I have tried many format informats but i am not getting anything in the column.
Here is the code i wrote(While I commented out certain things I have tested all the permutations and combinations with the formats and informats i could think of):
libname asgn1 "C:\Users\*****\abc";
data asgn1.Car_sales_1_1;
infile "C:\Users\********\Car_sales.csv" dsd dlm="," FIRSTOBS=2 ;
input Manufacturer $ Model $ Fuel_efficiency Latest_Launch;
* format Latest_Launch mmddyy10.;
* informat Latest_Launch mmddyy10.;
run;
Please help...
Change your informat to date11. (dd-mmm-yyyy).
SAS Informats by Category > http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htm
I tried the following code and I got just the result I wanted....Thanks #Chris J
libname asgn1 "C:\Users\*****\abc";
data asgn1.Car_sales_1_1;
infile "C:\Users\********\Car_sales.csv" dsd dlm="," FIRSTOBS=2 ;
input Manufacturer $ Model $ Fuel_efficiency Latest_Launch;
informat Latest_Launch date11.;
format Latest_Launch ddmmyy10.;
run;

format date columns when "?" appears in same column

I have several columns with dates and some entries contain the entry "?" and other entries contain dates in the MMDDYY10. format.
I compare dates at a later point, and have the code that works for that, but the missing entries and "?" cause errors to occur and observations to be created.
here is my import code:
data WORK.esn_service ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'C:\Documents and Settings\richardg\Desktop\Sirius\esn_service.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat DEACTIVATION_DATE best10. ;
informat DEACTIVATION_REASON $35. ;
informat REACTIVATION_DATE best10. ;
format DEACTIVATION_DATE mmddyy10. ;
format DEACTIVATION_REASON $35. ;
format REACTIVATION_DATE mmddyy10. ;
input
DEACTIVATION_DATE
DEACTIVATION_REASON $
REACTIVATION_DATE
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
The two date columns are causing the error. I need to later compare dates, so I cant just pick a random date to replace the problem cells.
You have an issue with how you are importing the data. Once a column is a character variable, it's stuck that way - you have to create a new column to change it. Either change how you import it to bring it in as numeric, or create a new column for each to force it to be numeric.
If your data has MMDDYY10 in it already (in the CSV file), then you need to use INFORMAT. INFORMAT controls how SAS reads in the data.
data WORK.esn_service ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'C:\Documents and Settings\richardg\Desktop\Sirius\esn_service.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat DEACTIVATION_DATE mmddyy10. ;
informat DEACTIVATION_REASON $35. ;
informat REACTIVATION_DATE mmddyy10. ;
format DEACTIVATION_DATE mmddyy10. ;
format DEACTIVATION_REASON $35. ;
format REACTIVATION_DATE mmddyy10. ;
input
DEACTIVATION_DATE
DEACTIVATION_REASON $
REACTIVATION_DATE
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;