Why do I keep getting an invalid data error? I verified the numbers are in the correct columns and that the dates are structured in 'input' just as it is in 'datalines'.
data ThreeDates;
input #1 Date1 mmddyy10.
#12 Date2 mmddyy10.
#23 Date3 date9.;
format Date1
Date2
Date3 mmddyy10.;
datalines;
01/03/1950 01/03/1960 03Jan1970
05/15/2000 05/15/2002 15May2003
10/10/1998 11/12/2000 25Dec2005
;
run;
NOTE: Invalid data for Date1 in line 185 1-10.
NOTE: Invalid data for Date2 in line 185 12-21.
NOTE: Invalid data for Date3 in line 185 23-31.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9-
185 01/03/1950 01/03/1960 03Jan1970
Date1=. Date2=. Date3=. _ERROR_=1 _N_=1
NOTE: Invalid data for Date1 in line 186 1-10.
NOTE: Invalid data for Date2 in line 186 12-21.
NOTE: Invalid data for Date3 in line 186 23-31.
186 05/15/2000 05/15/2002 15May2003
Date1=. Date2=. Date3=. _ERROR_=1 _N_=2
NOTE: Invalid data for Date1 in line 187 1-10.
NOTE: Invalid data for Date2 in line 187 12-21.
NOTE: Invalid data for Date3 in line 187 23-31.
187 10/10/1998 11/12/2000 25Dec2005
Date1=. Date2=. Date3=. _ERROR_=1 _N_=3
NOTE: The data set WORK.THREEDATES has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Is it due to the space in front of each dataline? Make sure you also put the semicolon in column 1. Also, use datalines4 always as a habit.
The issue is clearly shown in the SAS log.
NOTE: Invalid data for Date1 in line 185 1-10.
NOTE: Invalid data for Date2 in line 185 12-21.
NOTE: Invalid data for Date3 in line 185 23-31.
RULE: ----+----1----+----2----+----3----+----4----+----5
185 01/03/1950 01/03/1960 03Jan1970
The RULE line with the dashes. plus sign and digits is there to help you understand exactly what was in the columns listed in the NOTE.
SAS only recognizes dates back to 1582, so trying to create dates in the year 195, 196 or 197 like on the first data line is not going to work.
You have misaligned your fixed column input with where the data is on the line. Probably because of the extra space you have added in front of every line of data, including the line with the semicolon that marks the end of the data (and also marks the end of the data step.)
Either remove the leading spaces or adjust the cursor movement to match the actual locations.
For this data you could also just use LIST MODE input instead of FORMATTED MODE and then the number of leading spaces or number of spaces between the values wouldn't matter. Adding the colon modifier in front of any in-line informats in the INPUT statement will insure the INPUT still operates in LIST MODE.
Note that in list mode the width specified for the informat is ignored, the whole next "word" on the line is used whatever its length.
Use a single period to indicate a missing value (numeric or character) when using LIST MODE input, The period will make sure that the values are used in the right order.
data ThreeDates;
input date1 :mmddyy. date2 :mmddyy. date3 :date.;
format date1-date3 yymmdd10.;
datalines;
01/03/1950 01/03/1960 03Jan1970
05/15/2000 05/15/2002 15May2003
10/10/1998 11/12/2000 25Dec2005
;
proc print;
run;
Results
You should get in the habit of starting both the DATALINES statement and the terminal line with the semicolon in first column to avoid getting column numbers confused.
Related
I'm new to SAS. I'm trying to read a txt file where the same variables are listed in multiple columns.
The first variable is the date. The second one is time, and the last one is Blood Glucose. Thanks a lot for your kindness and help.
Sincerely
Wilson
The data can be read using a list input statement with the : (format modifier) and ## (line hold) features specified.
glucose-readings.txt (data file)
01jan16 14:46 89 03jan16 11:27 103 04jan16 09:40 99
05jan16 09:46 105 11jan16 10:58 108 13jan16 10:32 109
14jan16 10:49 90 18jan16 09:32 110 25jan16 10:37 100
Sample program
data want;
infile "c:\temp\glucose-readings.txt";
input
datepart :date9.
timepart :time5.
glucose
##;
datetime = dhms(datepart,0,0,timepart);
format
datepart date9.
timepart time5.
datetime datetime19.
glucose 3.
;
;
proc print; run;
From the documentation INPUT Statement: List
:
... For a numeric variable, this format modifier reads the value from the next non-blank column until the pointer reaches the next blank column or the end of the data line, whichever comes first.
...
##
holds an input record for the execution of the next INPUT statement across iterations of the DATA step. This line-hold specifier is called double trailing #.
...
Tip The double trailing # is useful when each input line contains values for several observations.
Be sure to read the documentation, that is were you will find detailed explanations and useful examples.
I have a datafile which uses blank space as delimiter. I want to write a data step to read this file into sas.
The fields are not separated by a single blanks in most of the cases the fields are separated by more than 10 blanks spaces.I have checked using notepad++ and the delimiters are not tabs.
137 3.35 Afghanistan 2009-07-08
154 2.43 Albania 2009-07-22
101 1.22 Antigua and Barbuda 2009-06-24
155 4.13 Federated States of Micronesia 2009-07-22
I am tried writing informat statements for these and have been unsuccessful
Here's what I have done so far
input casedt1id :$3. contntid :4 country :&$32. casedt1 yymmdd10.
This reads only the first field properly and the rest get missing values.
The question is to write an informat statement to read this data ?
thanks for the help.
regards
jana
You can use the # symbol to control where the pointer reads from on the line. It looks like you have a fixed starting column for each variable.
data want;
input #1 casedt1id :$3. #14 contntid :4 #28 country :&$32. #61 casedt1 :yymmdd10.;
format casedt1 yymmdd10.;
datalines;
137 3.35 Afghanistan 2009-07-08
154 2.43 Albania 2009-07-22
101 1.22 Antigua and Barbuda 2009-06-24
155 4.13 Federated States of Micronesia 2009-07-22
;
That looks like fixed column data to me. The problem then is using INFORMATs with fixed column data. This should work
input casedt1id $ 1-3 contntid 4-27 country $28-60 casedt1 yymmdd10.;
format casedt1 yymmdd10.;
The trick is make sure the pointer is in the right place when it tries to read the formatted text. So in the statement above that is done by telling it read to column 60 for COUNTRY. So now you are at column 61 when you are ready to read the date. You could also use + or # to move the pointer.
... #61 casedt1 yymmdd10. ...
If you are reading from a variable length file (most files now are variable length) then make sure to add the TRUNCOVER option to the INFILE statement just in case the date is missing or written using fewer than 10 characters.
This is the story
This is the input file
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
this is the import statement that I am using
data datetimedata;
infile fileref dlm=',';
input lastname$ datechkin mmddyy10. datechkout mmddyy10. room_rate equip_cost;
run;
the below is the log which shows success
NOTE: The infile FILEREF is:
Filename=\\VBOXSVR\win_7\SAS\DATA\datetime\datetimedata.csv,
RECFM=V,LRECL=256,File Size (bytes)=688,
Last Modified=13Jun2015:12:08:36,
Create Time=13Jun2015:09:13:09
NOTE: 17 records were read from the infile FILEREF.
The minimum record length was 34.
The maximum record length was 40.
NOTE: The data set WORK.DATETIMEDATA has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
I have published only 3 observation here.
Now when I print the sas dataset everything works fine except the room_rate variable.
THe output should be 3 digit numbers , but i am getting only the last digit .
Where Am i going wrong !!!
You're mixing input types. When you use list input, you can't specify informats. You either need to specify them using modified list input (add a colon to the informat) or use an informat statement earlier. The following works.
data datetimedata;
infile datalines dlm=',';
input lastname$ datechkin :mmddyy10. datechkout :mmddyy10. room_rate equip_cost;
datalines;
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
;;;;
run;
proc print data=datetimedata;
run;
I am trying to make character informat from the range values given in a dataset.
Dataset : Grade
Start End Label Fmtname Type
0 20 A $grad I
21 40 B $grad I
41 60 C $grad I
61 80 D $grad I
81 100 E $grad I
And here is the code i wrote to create the informat
proc format cntlin = grade;
run;
And now the code to create a temp dataset using the new informat
data temp;
input grade : $grad. ## ;
datalines;
21 30 0 45 10
;
The output i wanted was a dataset Temp with values :
Grade
A
B
A
..
Whereas the dataset Temp has values :
Grade
21
30
0
...
SAS Log Entry :
1146 proc format cntlin = grade;
NOTE: Informat $GRAD has been output.
1147 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: There were 5 observations read from the data set WORK.GRADE.
1148
1149
1150 data temp;
1151 input grade : $grad. ## ;
1152
1153 datalines;
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set WORK.TEMP has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
I am not able to understand why informat is not working. Can anyone please
explain where i am making my mistake.
INFORMATS convert characters to (characters or numbers). So you can't use START/END the way you are doing so, since that only works with numbers.
See the following:
proc format;
invalue $grade
'0'-'20'="A"
'21'-'40'="B"
'41'-'60'="C"
'61'-'80'="D"
'81'-'100'="E";
quit;
proc format;
invalue $grade
'21'='A';
quit;
The latter works, the former gives you an error. So, you could write a dataset with all 101 values (each on a line with START), or just write a format and do it in a second step (read in as a number and then PUT to the format).
I have data in csv format with a certain timestamp field in this format:
' 2009-07-30 20:50:19'
How can I read that into a SS dataset? Ive been trying this, but to no avail.
data filecontents;
infile "C:\es.txt" dlm=',' MISSOVER DSD firstobs=2 lrecl=32767 ;
input START_TIME :ANYDTDTM.
FORMAT START_TIME datetime.
Thanks.
Seems fine to me. The below code works on my machine (9.3 TSM2). What happens for you? Are you just missing a semicolon after the input statement (your example code is)?
data test;
infile "c:\temp\test.csv" dlm=',' missover;
input
dtvar :YMDDTTM.
var1 $
var2 $;
format dtvar DATETIME19.;
put dtvar= DATETIME19.;
run;
result:
608 data test;
609 infile "c:\temp\test.csv" dlm=',' missover;
610 input
611 dtvar :YMDDTTM.
612 var1 $
613 var2 $;
614 format dtvar DATETIME19.;
615 put dtvar= DATETIME19.;
616 run;
NOTE: The infile "c:\temp\test.csv" is:
Filename=c:\temp\test.csv,
RECFM=V,LRECL=256,File Size (bytes)=31,
Last Modified=20Nov2012:20:20:51,
Create Time=20Nov2012:20:17:51
dtvar=30JUL2009:20:50:19
NOTE: 1 record was read from the infile "c:\temp\test.csv".
The minimum record length was 29.
The maximum record length was 29.
NOTE: The data set WORK.TEST has 1 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
For what it's worth, YMDDTTMw.d is the specific informat for that (ANYDTDTM. will work as well of course).