I have two lines of observations to read in SAS.
It is a comma-delimited data set.
My code is as below:
DATA SASweek1.industry;
INFILE "&Dirdata.Assignment1_Q6_data.txt" DLM="," DSD termstr=crlf TRUNCOVER;
LENGTH Company $ 15;
INPUT Company $ State $ Expense COMMA9. ;
FORMAT Expense DOLLAR9.;
*INFORMAT Expense DOLLAR10.;
RUN; * not ready;
The raw data set looks like this:
I can print out the first line of observations well,
but the last "0" will go to the first position of the second
line, becoming "0Lee's..".
Any suggestions would be highly appreciated!!
It is just doing what you told it to do. You told it to read exactly 9 characters.
Normally you should not use formatted input mode with delimited data. You prevent that by either adding the : (colon) prefix in front of the informat specification in the INPUT statement or removing the informat specification completely and using an INFORMAT statement to let SAS know what informat to use.
But your data is NOT properly delimited because the last field contains the delimiter, but the value is not enclosed in quotes. So the commas make it look like two values instead of one. The real solution is to fix the process that created the file to create a valid delimited file. It needs to quote the values with commas in them, or remove the commas from the numbers, or use a delimiter character that does not appear in the data.
Fortunately since it is the last field on the line you CAN use formatted input to read just that field. Since you are using the TRUNCOVER option just set the width of the informat in the INPUT statement to the maximum.
DATA SASweek1.industry;
INFILE "&Dirdata.Assignment1_Q6_data.txt" DLM="," DSD termstr=crlf TRUNCOVER;
LENGTH Company $15 State $15 Expense 8;
INPUT Company State Expense COMMA32. ;
FORMAT Expense DOLLAR9.;
RUN;
When I run some simple code to create a variable using the strip function, it properly strips leading and trailing blanks. However, when I call the program the code is in with a %include. It doesn't work.
data aniorder;
input animalname $ seq taxonomy $20.;
datalines;
DOG 1 800
ELEPHANT 2 0
FISH 3 0
;
run;
data aniorder2;
set aniorder;
ani = strip(animalname);
keep ani seq taxonomy;
run;
When running %include, I don't get an error message, but the variable "ani" on the aniorder2 dataset still has the leading and trailing blanks. This doesn't happen when I just run the code above.
Anyone have any idea what's happening with %include here?
Variables will ALWAYS have trailing blanks. SAS character variables are fixed length and padded with blanks. But the STRIP() function call will definitely remove leading blanks, but there should not be any leading blanks given the informat used in your input statement.
Perhaps your source file has the wrong end of line characters? For example if you made the file on a PC it would normally have CR+LF at the end of each line. If you then read that file on Unix the CR would become part of the data for the line. And since your posted example as leading spaces on the data lines (why are those there?) perhaps you saved the file using TAB characters to replace some of the blank characters?
Try adjusting your program. You could try adding the / TERMSTR=CRLF option to the %INCLUDE statement and see if that eliminates the trailing "blanks". You could try adding an INFILE statement so that you can include the EXPANDTABS and see if that eliminates the leading "blanks". You could also add the TRUNCOVER option since you are reading more characters than are on the lines. Not sure if it is needed since normally in-line data is padded automatically to a multiple of 80 characters.
data aniorder;
infile datalines expandtabs truncover ;
input animalname $ seq taxonomy $20.;
datalines;
DOG 1 800
ELEPHANT 2 0
FISH 3 0
;
I would like to read following instream datalines
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
I employed while it read AAA items to team variables but not as div. And how should I place &(ampersand to read character with embedded blanks?)
data scores2;
infile datalines dlm=",";
input name : $10. score1-score3 team $20. div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Notice I have used : before team also ( well you have already used colon operator : for other variables , not sure why did you miss over here) As I have already mentioned in your other query, use : colon operator (tilde, dlm and colon format modifier in list input) which would tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Here as you had not used this operator , that is why SAS was trying to read 20 chars, even though
there was a delimiter in between.
Tested
data scores2;
infile datalines dlm=",";
input name : $10.
score1-score3
team : $20.
div : $3.;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Another way to do this that's often a bit easier to read is to use the informat statement.
data scores2;
infile datalines dlm=",";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team $ div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
That accomplishes the same thing as using the colon (input name :$10.) but organizes it a bit more cleanly.
And just to be clear, embedded blanks are irrelevant in comma delimited input; '20'x (ie, space) is just another character when it's not the delimiter. What ampersand will do is addressed in this article, and more specifically, if space is the delmiiter it allows you to require two consecutive delimiters to end a field. Example:
data scores2;
infile datalines dlm=" ";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team & $ div $;
datalines;
Smith 12 22 46 Green Hornets AAA
FriedmanLi 23 19 25 High Volts AAA
Jones 09 17 54 Las Vegas AA
;
run;
Note the double space after all of the team names - that's required by the &. But this is only because delimiter is space (which is default, so if you removed the dlm=' ' it would also be needed.)
In the following code
data temp2;
input id 1 #3 date mmddyy11.;
cards;
1 11/12/1980
2 10/20/1996
3 12/21/1999
;
run;
what do 1 #3 symbols mean ? i presume 1 means that id is the first character in the data . I know that #3 means that date variable starts with the third character , but why is it in front of date whereas 1 is after id?
Because that's a badly written input statement. You can specify input in a number of ways, and that mixes a few different ways to do things which happen to be allowed to mix (mostly). Read the SAS documentation on input for more information.
Some common styles that you can use:
input #1 id $5.; *Formatted input. Allows specification of start position and informat, more useful if using date or other informat that is not just normal character/number.;
input id str $ otherstr $ date :date9.; *List input. This is for delimited text (like a CSV), still lets you specify informat.
input #'ID:' id $5.; *A special case of formatted input. allows you to parse files that include the variable name, useful for old style files and some xml/json/etc. type files;
input x 1-10 y 11-20; *Column input. Not used very commonly as it's less flexible than start/informat style.;
There are other options (such as named input) that are not very frequently used in my experience.
In your specific example, the first variable is read in with column input [id 1 says 'read a 1 character numeric from position 1 into id'] and then the second variable is read with formatted input [#3 date mmddyy11. says 'Read an 11 character date variable from position 3[-13] into a numeric using the date informat to translate it to a number.'] It also says someone gave you that code who isn't very familiar with SAS, since mmddyy10. is the correct informat - the 11th character cannot be helpful.
I’m working with some raw data that has fixed column widths, but has all its records written into a single line (blame the data vendor, not me :-) ). I know how to use
fixed column widths in the INPUT statement, and how to use ## to read more than one observation per line, but I am having trouble when I try to do both.
As an example, here’s some code where the data has fixed column widths, but there is one line per record. This code works fine:
DATA test_1;
INPUT alpha $ 1-5 beta $ 6-10 gamma 11-15 ;
DATALINES;
a f 1
ab fg 12
abc fgh 123
abcd fghi 1234
abcdefghij12345
;
RUN;
Now here’s the code for what I’m really trying to do – all the data is in one line, and I try to use the ## notation:
DATA test_2;
INPUT alpha $ 1-5 beta $ 6-10 gamma 11-15 ##;
DATALINES;
a f 1 ab fg 12 abc fgh 123 abcd fghi 1234 abcdefghij12345
;
RUN;
This fails because it just keeps reading the beginning 15 characters, holding that record, and re-reading from the start. Based on my understanding of the semantics of the ## notation, I can definitely understand why this would be happening.
Is there any way I can accomplish reading fixed column data from a single line; that is, make test_2 have the same content as test_1? Perhaps through some combination of symbols in the INPUT statement, or maybe resorting to another method (with file I/O functions, PROC IMPORT, etc.)?
Have you tried specifying variable lengths using informats?
For example:
DATA test_2;
INPUT alpha $5. beta $5. gamma 5.0 ##;
DATALINES;
a f 1 ab fg 12 abc fgh 123 abcd fghi 1234 abcdefghij12345
;
RUN;
From the SAS documentation:
Formatted input causes the pointer to move like that of column input
to read a variable value. The pointer moves the length that is
specified in the informat and stops at the next column.