format date columns when "?" appears in same column

format date columns when "?" appears in same column - sas

I have several columns with dates and some entries contain the entry "?" and other entries contain dates in the MMDDYY10. format.
I compare dates at a later point, and have the code that works for that, but the missing entries and "?" cause errors to occur and observations to be created.
here is my import code:
data WORK.esn_service ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'C:\Documents and Settings\richardg\Desktop\Sirius\esn_service.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat DEACTIVATION_DATE best10. ;
informat DEACTIVATION_REASON $35. ;
informat REACTIVATION_DATE best10. ;
format DEACTIVATION_DATE mmddyy10. ;
format DEACTIVATION_REASON $35. ;
format REACTIVATION_DATE mmddyy10. ;
input
DEACTIVATION_DATE
DEACTIVATION_REASON $
REACTIVATION_DATE
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
The two date columns are causing the error. I need to later compare dates, so I cant just pick a random date to replace the problem cells.

You have an issue with how you are importing the data. Once a column is a character variable, it's stuck that way - you have to create a new column to change it. Either change how you import it to bring it in as numeric, or create a new column for each to force it to be numeric.
If your data has MMDDYY10 in it already (in the CSV file), then you need to use INFORMAT. INFORMAT controls how SAS reads in the data.
data WORK.esn_service ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'C:\Documents and Settings\richardg\Desktop\Sirius\esn_service.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat DEACTIVATION_DATE mmddyy10. ;
informat DEACTIVATION_REASON $35. ;
informat REACTIVATION_DATE mmddyy10. ;
format DEACTIVATION_DATE mmddyy10. ;
format DEACTIVATION_REASON $35. ;
format REACTIVATION_DATE mmddyy10. ;
input
DEACTIVATION_DATE
DEACTIVATION_REASON $
REACTIVATION_DATE
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;

Related

How to generate new import code following a structure change in an upstream flat file without headers?

I have an existing process that imports data from a flat file with no headers. There are hundreds of columns. The provider of the file has added several hundred more columns at different points within the existing columns. I have a list of the old and new column names and SAS code that properly sets the data types for the old columns but not the new ones. I'd rather not have to go through my existing import code and manually write column headers and data formats but I'm not sure how to use these parts to get new import code for the new headers.
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
informat oldcol1 best32.;
informat oldcol2 mmddyy10.;
informat oldcolN $60.;
format oldcol1 best32.;
format oldcol2 mmddyy10.;
format oldcolN $60.;
input
oldcol1
oldcol2
oldcolN $;
run;
I have the header information in an Excel file right now.
old K010H K010I K010J K020A
new K010H K010I K010J K010L K010M K010N K020A

Based on your description, I presume you either know or will find out the informats for the new columns also. If that is the case, why don't you auto generate the code to read the file?
Since you have the header information, assuming you can modify it to the following format and save as a CSV:
var infmt
K010H best32.
K010I mmddyy10.
K010J $60.
K010L best32.
K010M mmddyy10.
K010N $60.
K020A best32.
Then something like this would automatically generate the code and read the data for you:
proc import datafile="cols.csv" out=cols replace;
run;
proc sql;
select var into :cols separated by ' ' from cols ;
select infmt into :infmts separated by ' ' from cols ;
quit;
%macro gen_code;
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) ~= %str());
%let col = %scan(&cols, &ii, %str( ));
%let infmt = %scan(&infmts, &ii, %str( ));
informat &col &infmt ;
%let ii = %eval(&ii + 1);
%end;
input
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) NE %str());
%let col = %scan(&cols, &ii, %str( ));
&col
%let ii = %eval(&ii + 1);
%end;
;
run;
%mend;
%gen_code;
In the future, you could make modifications to your header CSV file and the rest will be taken care by the code itself.

If you have a machine readable data dictionary then you can generate the code from that. Otherwise you will need to just edit your data step. While you are at it you can clean it up so that it is easier to maintain.
First thing is to use LENGTH or ATTRIB to define the variables, instead of forcing SAS to guess. Second only attach informats or formats to variables that need them. For example there is no need to attach informats to normal strings or numbers. No need to attach $xx format to character variables. Do you really need to attach BEST32. format to numbers instead of letting SAS go ahead and display the numeric variables without formats attached using the default BEST12. format?
Second if you define the variables in the order they appear then you can use a positional variable list in the INPUT statement. Then you only have to change the INPUT statement if the first or last variable changes.
So for your example you might create a data step like this instead.
data raw_file;
infile "flatfile.csv" dlm="|" truncover dsd firstobs=1;
length
oldcol1 8
oldcol2 8
oldcolN $60
;
informat oldcol2 mmddyy10.;
format oldcol2 mmddyy10.;
input oldcol1 -- oldcolN ;
run;
Then adding new variables is as simple as inserting them into right place in the LENGTH statement and when needed adding them to the INFORMAT and/or FORMAT statements. If you don't know what the variables contain then make them as character strings and look at the resulting values and decide later if you need to define them differently.

SAS Date Informat with Milliseconds

I am trying to create a SAS informat for the following date format:
"yyyy-mm-dd hh:ii:ss.SSS UTC", example: "2016-01-14 10:31:01.456 UTC"
I've gotten close using the following format code:
PROC FORMAT;
PICTURE MyDate other='%0Y-%0m-%0d %0H:%0M:%0s UTC' (datatype=datetime);
RUN;
Unfortunately when I try and use this as an INFORMAT I get an error "can't find format MyDate", as it hasn't been defined as an informat, just an output format.
I can try and create an informat from a dataset created from this format, but due to the milliseconds constraint it will only create values that map to times with .000 in the milliseconds section. For example:
DATA MyInDate ;
RETAIN FMTNAME "MyInputDate" type "I" ;
do label = "1jan2016:00:00:00"dt to
"2feb2016:00:00:00"dt by 1;
start = trim(left(put(label,MyDate.)));
output ;
end ;
RUN;
PROC FORMAT CNTLIN=MyInDate;
RUN;
Even if I were able to enumerate a dataset with milliseconds it would be prohibitively large. Since my dates can span years.
Is it possible to truncate my input data BEFORE passing it to the informat? I don't care about the milliseconds or the UTC qualifier. I can't change the input data.
EDIT: Using anydtdtm. as an informat results in empty values without error messages. Here is the data step making use of this informat:
DATA WORK.ImportDatTest;
LENGTH
'Event Time'n 8
;
FORMAT
'Event Time'n DATETIME25.
;
INFORMAT
'Event Time'n anydtdtm.
;
INFILE DATALINES DLM=','
;
INPUT
'Event Time'n : ANYDTDTM.
;
DATALINES;
2016-01-11 17:23:34.834 UTC
2016-01-11 17:23:34.834 UTC
2016-01-11 17:23:34.834 UTC
;
RUN;

Unfortunately, there is no way to create a picture informat in SAS currently. You would need to convert your data to a format SAS has a built-in informat for, or use a function or similar to format the data.
However, yours already is in such a format, so you shouldn't need to create an informat.
data test;
x="2015-10-05 10:12:24.333 UTC";
y=input(x,anydtdtm.);
put y= datetime17.;
run;
You can certainly truncate data while using an informat; by specifying a length in the informat, it will truncate to that length.
Here's an example using input from datalines:
data test;
infile datalines dlm=',' dsd;
input y :anydtdtm32.;
put y= datetime22.3;
datalines;
2015-10-05 10:12:24.333 UTC
2014-03-01 08:08:05.435 UTC
2013-01-01 23:02:05.445 UTC
;;;
run;

SAS: Using macro to import multiple text files

I am using SAS to import hundreds of csv files.
There are 300 city data-sets with the following naming convention (Shanghai001-Shanghai100, London001-London100, Newyork001-Newyork100).
My current import code is
data shanghai001;
infile 'H:\shanghai001.csv'
delimiter = ',' DSD lrecl=32767 firstobs=2;
informat Date_L_ DATE11.;
informat Time_L_ time18.3;
informat Type $10.;
format Date_L_ DATE11.;
format Time_L_ time18.3;
format Type $10.;
input
Date_L_
Time_L_
Type $
;
run;
This code works, but I just want to know how to use macro to import these 300 data sets?
Any smart guy can tell me ?

You could use a macro to do this, but you do not need to. It would probably be much easier to read all of the CSV files into a single data set. You could add variables like FNAME and VERSION to tell which observations came from which source file.
data all_data;
length fname $100 path $200 version 8 ;
length Date_L_ Time_L_ 8 Type $10.;
informat Date_L_ date11. Time_L_ time18.3 ;
format version z3. Date_L_ date11. Time_L_ time12.3 ;
do fname='shanghai','london','newyork';
do version=1 to 100 ;
path = catx('\','H:',cats(fname,put(version,z3.),'.csv'));
if not fileexist(path) then do;
put 'ERROR: File not found. ' path=:$quote.;
continue;
end;
infile csvfile filevar=path dsd truncover end=eof;
if not eof then input ;
do while (not eof);
input Date_L_ Time_L_ Type ;
output;
end;
end;
end;
run;

SAS PROC Import vs DATA step with INFILE

My PROC IMPORT step is throwing "import unsuccessful" effor when I am trying to read a '~' delimited file containing address field. In the CSV file, 5 byte zip code is automatically treated as a numeric field and once in a while I am getting bad data records with invalid zip codes as VXR1#. When this is encountered I am getting "import unsuccessful" error and the SAS job is failing.
PROC IMPORT is automatically getting converted to DATA step with an infile. So I tried DATA step with INFILE option and with INFORMATS and FORMATS and changed the FORMAT of ZIP to alphanumeric. But I faced different issue now. With DATA, INFORMAT and FORMAT keywords, the lengths mismatch is happening and the data is getting moved to different locations automatically. Could someone help me to figure out a solution for this issue?
Included PROC IMPORT I used and DATA file step I used below for reference:
data WORK.TRADER_STATS ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile '/sascode/test/TRADER_STATS.csv' delimiter = '~' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat TRADER_id best32. ;
informat dealer_ids $60. ;
informat dealer_name $27. ;
informat dealer_city $15. ;
informat dealer_st $2. ;
informat dealer_zip $5. ;
informat SNO best32. ;
informat start_dt yymmdd10. ;
informat end_dt yymmdd10. ;
input
TRADER_id
dealer_ids $
dealer_name $
dealer_city $
dealer_st $
dealer_zip
sno
start_dt
end_dt
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
proc import file="/sascode/test/TRADER_STATS_BY_DAY.csv" out=WORK.TRADER_STATS_BY_DAY
dbms=dlm replace;
delimiter='~';
;run;

Try Using the : colon operator which will tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered, which will sort out your problem of - data getting moved to different locations automatically
data WORK.TRADER_STATS ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile '/sascode/test/TRADER_STATS.csv' delimiter = '~' MISSOVER DSD lrecl=32767 firstobs=2 ;
input TRADER_id : best32.
dealer_ids : $60.
dealer_name : $27.
dealer_city : $15.
dealer_st $ : $2.
dealer_zip : $5.
sno : best32.
start_dt : yymmdd10.
end_dt : yymmdd10.;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;

dataset using datalines infile with two dates variables

My datalines has 2 variables, date 1 and date 2 with corresponding format ddmmyy10. and mmddyy10.
data date;
input date1 ddmmyy10. date2 mmddyy10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
11/01/2015 01/11/2015
12/01/2015 01/12/2015
13/01/2015 01/13/2015
;
run;
I tried to add the code but still not work
infile datalines delimiter=' ';

The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column, the defined length of the variable (character only), or the end of the data line, whichever comes first.
Format Modifier
data date;
input date1: ddmmyy10. date2: mmddyy10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
11/01/2015 01/11/2015
12/01/2015 01/12/2015
13/01/2015 01/13/2015
;
run;

The reason why the code is not working is because you are using List input to read non-standard data without colon input modifier or informat statement.
For non-standard data(commas, dollar, date etc. -> Reading Raw data -> Kinds of Data ) or standard data of length greater than 8 byte using List input technique you would need to use either INFORMAT statement or colon modifier with INPUT statement.
1) Assign informat for the input variables using INFORMAT statement or ATTRIB statement
data date;
informat date1 ddmmyy10. date2 mmddyy10.;
input date1 date2;
format date1-date2 yymmdd10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
;
run;
2) Use Colon (:) input modifier
data date;
input date1: ddmmyy10. date2: mmddyy10.;
format date1-date2 yymmdd10.;
datalines;
09/01/2015 01/09/2015
10/01/2015 01/10/2015
;
run;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js