I would like to read following instream datalines
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
I employed while it read AAA items to team variables but not as div. And how should I place &(ampersand to read character with embedded blanks?)
data scores2;
infile datalines dlm=",";
input name : $10. score1-score3 team $20. div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Notice I have used : before team also ( well you have already used colon operator : for other variables , not sure why did you miss over here) As I have already mentioned in your other query, use : colon operator (tilde, dlm and colon format modifier in list input) which would tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Here as you had not used this operator , that is why SAS was trying to read 20 chars, even though
there was a delimiter in between.
Tested
data scores2;
infile datalines dlm=",";
input name : $10.
score1-score3
team : $20.
div : $3.;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Another way to do this that's often a bit easier to read is to use the informat statement.
data scores2;
infile datalines dlm=",";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team $ div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
That accomplishes the same thing as using the colon (input name :$10.) but organizes it a bit more cleanly.
And just to be clear, embedded blanks are irrelevant in comma delimited input; '20'x (ie, space) is just another character when it's not the delimiter. What ampersand will do is addressed in this article, and more specifically, if space is the delmiiter it allows you to require two consecutive delimiters to end a field. Example:
data scores2;
infile datalines dlm=" ";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team & $ div $;
datalines;
Smith 12 22 46 Green Hornets AAA
FriedmanLi 23 19 25 High Volts AAA
Jones 09 17 54 Las Vegas AA
;
run;
Note the double space after all of the team names - that's required by the &. But this is only because delimiter is space (which is default, so if you removed the dlm=' ' it would also be needed.)
Related
I have a number of text entries (municipalities) from which I need to remove the s at the end.
Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;
data cities;
set test;
if prxmatch("/^(.*?)s$/",city)
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;
Strangely enough, my s's are only removed from my first entry.
What am I doing wrong?
You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.
data have;
input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;
data want;
set have;
length new_city $20 ;
new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;
Result
Obs city new_city
1 arjepogs arjepog
2 Kent Kent
3 askers asker
4 Londons London
You could also just change the REGEX to account for the trailing spaces.
new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
Here is another solution using only SAS string functions and no regex. Note that in this case there is no need to trim the variable:
data cities;
set test;
if substr(city,length(city)) eq "s" then
city=substr(city,1,length(city)-1);
run;
I am trying to input dates using datalines but it is not working:
data demographic;
input Subj #5 DOB mmddyy6. #16 Gender $ Name $;
format dob ddmmyy10.;
datalines;
001 10/15/1960 M Friedman
002 08/01/1955 M Stern
003 12/25/1988 F McGoldrick
005 05/28/1949 F Chien
;
run;
What seems to be the problem?
When you include the format on the input statement it expects exactly that format/length. In this case, you've incorrectly specified the date informat. It's a 10 character variable and you're trying to use only 6. If you create an INFORMAT statement and remove the specifications from the INPUT statement it will work fine.
data demographic;
informat subj $3. dob mmddyy10. gender $1. name $16.;
input Subj DOB Gender Name ;
format dob ddmmyy10.;
datalines;
001 10/15/1960 M Friedman
002 08/01/1955 M Stern
003 12/25/1988 F McGoldrick
005 05/28/1949 F Chien
;
run;
You told SAS to start reading DOB at column 5 and read only 6 characters. But your data values for DOB include slashes and century digits on the year, so they will take up to 10 characters.
If your input stream is fixed format then you could just change the width on your informat to 10 instead of 6. So if your data field positions are fixed your input statement might look like this. (Personally I like to format dates using YMD order to avoid confusion when working with people in both the US (MDY) and EU (DMY).)
data demographic;
input Subj 1-3 #5 DOB MMDDYY10. Gender $16 Name $18-27;
format dob YYMMDD10.;
datalines;
001 10/15/1960 M Friedman
002 08/01/1955 M Stern
003 12/25/1988 F McGoldrick
005 05/28/1949 F Chien
;
Now if your input data are not in fixed column locations (and any missing values are represented by .) then you could read the data using list mode input instead. Use the : modifier before any informats on the input statement to insure that you are using list mode and not formatted mode. When SAS reads variable values in list mode it will ignore the width on the applicable informat and use the length of the current word being read instead. If you want to take advantage of the INPUT statement being the first place that your variables appear to define the variable's type and length then you could use $xx. informats on the input statement, just make sure to include : modifier. SAS will ignore the width when reading the data but since it is the first place you have referenced the variable SAS will use the width of the informat to help if guess how you wanted the variable defined.
data demographic;
input Subj DOB :MMDDYY. Gender :$1. Name :$10.;
format dob YYMMDD10.;
datalines;
001 10/15/1960 M Friedman
002 8/1/1955 M Stern
003 . F McGoldrick
005 5/28/1949 F Chien
;
Or instead you could define the variables explicitly using LENGTH or ATTRIB statements before using them in the INPUT statement. In that case you might want to just use a INFORMAT statement to tell SAS how to read DOB instead of including the informat in the INPUT statement. That can make the INPUT statement easier to write since you could use variable lists.
data demographic;
length Subj 8 DOB 8 Gender $1 Name $10 ;
informat dob mmddyy.;
format dob YYMMDD10.;
input subj -- name ;
datalines;
001 10/15/1960 M Friedman
002 8/1/1955 M Stern
003 . F McGoldrick
005 5/28/1949 F Chien
;
I am having trouble reading in inconsistent comma separated data. Here is a sample of what the data looks like:
JefferyThomas,"200","2,500","12,344",100,"999","865,100",800
GeorgeMontgomery,"50","700",200,"2,500","2,500","8,000","950"
I have never dealth with both numbers within quotes, as well as numbers not in quotes. If it was just one or the other, obviously that is not difficult to read in. But because some numbers are in quotes, and others not, I find myself having trouble reading in all of the data. This is what I have tried so far:
Data test;
INFILE ......"data.csv" dlm="," dsd missover;
length Name $16;
input Name $ Score1 Score2 Score3 Score4 Score5 Score6 Score7;
All this returns is missing values except for the numbers that aren't within quotes.
You need to also tell SAS to read numbers with commas using COMMA INFORMAT.
Data test;
INFILE cards dlm="," dsd missover;
length Name $16;
informat score1-score7 comma16.;
input (_all_)(:);
cards;
JefferyThomas,"200","2,500","12,344",100,"999","865,100",800
GeorgeMontgomery,"50","700",200,"2,500","2,500","8,000","950"
;;;;
run;
proc print;
run;
In the following program all data is read correctly
data test ;
infile datalines ;
input make 10$ mpg ## ; /* should I use make : 10$ . . */
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
run ;
proc print ;
run ;
The above code works fine, however my teacher says that I must use colon : and the correct answer is input make : 10$ mpg ## ;
I dont understand why . As far as I know : is useful if we have trailing spaces at the begining of a record line , otherwise why should we use it here ?
The colon tells SAS to use the following informat. Without the colon SAS would ignore that part (it doesn't do anything). SAS by default uses an informat (and resultant length) of $8. if you don't specify it otherwise.
You are always better off specifying the informat, as a character of 2 length stored in the default 8 length character variable would be wasting storage space and processing time, but it won't alter the value (assuming you know to be aware of the trailing spaces).
You can also specify the informat ahead of time:
data test;
infile datalines;
informat make $10.;
input
make $ mpg ##;
datalines;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;;;;
run;
proc print data=test;
run ;
I find that usually easier to read, although using :$10. in stream is acceptable as well.
The : modifier on the INPUT statement says to read the text using LIST MODE even when there is an in-line format specification. In list mode the input statement reads the next word on the line into the variable. Without the : modifier the INPUT statement FORMATTED MDOE will be used which will read exactly the number of character specified by the in-line informat. Even if this could cause it to stop before the end of the current word on the line or read past the delimiter into the next word on the line.
But there are other problems with your INPUT statement. As currently written it will generate an error.
2112 input make 10$ mpg ## ;
-
22
ERROR 22-322: Expecting a name.
The number 10 in the INPUT statement is taken to mean you want to read MAKE using COLUMN MODE input. So you want to read the single digit number from the 10th character on the line. Then the $ modifier after the column number is generating an error because there is no variable directly in front of it to modify. If you want specify an informat you need to include the period as part of the specification. If you want to specify a character informat instead of a numeric informat then the name of the informat should start with the $ character.
So your INPUT statement with an in-line informat specification for MAKE would look like:
input make :$10. mpg ## ;
The other way to make sure the MAKE is defined long enough to hold 10 characters is to define the variable before referencing it in the INPUT statement. Then SAS does not have to guess how you want it defined by how you are using it in the INPUT statement. Once the variable is known there is no need to include any extra characters in the INPUT statement.
data test ;
length make $10 mpg 8;
input make mpg ## ;
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
I am reading a period '.' as a character variable's value but it is reading it as a blank value.
data output1;
input #1 a $1. #2 b $1. #3 c $1.;
datalines;
!..
1.3
;
run;
Output Required
------ --------
A B C A B C
! ! . .
1 3 1 . 3
Please help me in reading a period as such.
The output is determined by the informat used ($w. informat in your case, requested by $1. in your code, so $1. is first of all informat definition, lenght definition of variable is a side product of this).
Use $char. informat for desired result.
data output1;
input #1 a $char1. #2 b $char1. #3 c $char1.;
datalines;
!..
1.3
;
run;
From documentation:
$w Informat
The $w. informat trims leading blanks and left aligns the values before storing the text. In addition, if a field contains only blanks and a single period, $w. converts the period to a blank because it interprets the period as a missing value. The $w. informat treats two or more periods in a field as character data.
$CHARw. informat
The $CHARw. informat does not trim leading and trailing blanks or convert a single period in the input data field to a blank before storing values.
I don't immediately see why it does not work.
But if you are not interested in figuring out why it does not work, but just want something that does: read it in as 1 variable of length $3. Then in a next step; split it using substr.
E.g.,
data output1;
length tmp $3;
input tmp;
datalines;
!..
1.3
;
run;
data output2 (drop=tmp);
length a $1;
length b $1;
length c $1;
set output1;
a=substr(tmp,1,1);
b=substr(tmp,2,1);
c=substr(tmp,3,1);
run;