I have a number of text entries (municipalities) from which I need to remove the s at the end.
Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;
data cities;
set test;
if prxmatch("/^(.*?)s$/",city)
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;
Strangely enough, my s's are only removed from my first entry.
What am I doing wrong?
You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.
data have;
input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;
data want;
set have;
length new_city $20 ;
new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;
Result
Obs city new_city
1 arjepogs arjepog
2 Kent Kent
3 askers asker
4 Londons London
You could also just change the REGEX to account for the trailing spaces.
new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
Here is another solution using only SAS string functions and no regex. Note that in this case there is no need to trim the variable:
data cities;
set test;
if substr(city,length(city)) eq "s" then
city=substr(city,1,length(city)-1);
run;
I have a value and I need to find what are the variables in a dataset has this value. All are character variables and search value is also character variable.
Example:
Value to be searched:123456
Output: Var1 and Var3
All variables are character
this is sample dataset: in reality there are hundreds of variables to be searched
Thanks in advance
You'd want to use WHICHC in conjunction with an array, and then VNAME to find out the name of the variable.
data have;
input x $ y $ z $;
datalines;
123456 234567 345678
234567 345678 123456
345678 234567 456789
;;;;
run;
data want;
set have;
array vars x y z;
pos = whichc('123456',of vars[*]);
if pos>0 then varname = vname(vars[pos]);
run;
WHICHC returns 0 if not found. Then you could use PROC FREQ on this dataset to find the list of all values that have this.
This particular approach only works if it can only exist once per row; if it can exist in two variables on the same row, you would have to loop over the array and search it iteratively.
I would like to read following instream datalines
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
I employed while it read AAA items to team variables but not as div. And how should I place &(ampersand to read character with embedded blanks?)
data scores2;
infile datalines dlm=",";
input name : $10. score1-score3 team $20. div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Notice I have used : before team also ( well you have already used colon operator : for other variables , not sure why did you miss over here) As I have already mentioned in your other query, use : colon operator (tilde, dlm and colon format modifier in list input) which would tell SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Here as you had not used this operator , that is why SAS was trying to read 20 chars, even though
there was a delimiter in between.
Tested
data scores2;
infile datalines dlm=",";
input name : $10.
score1-score3
team : $20.
div : $3.;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
Another way to do this that's often a bit easier to read is to use the informat statement.
data scores2;
infile datalines dlm=",";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team $ div $;
datalines;
Smith,12,22,46,Green Hornets,AAA
FriedmanLi,23,19,25,High Volts,AAA
Jones,09,17,54,Las Vegas,AA
;
run;
That accomplishes the same thing as using the colon (input name :$10.) but organizes it a bit more cleanly.
And just to be clear, embedded blanks are irrelevant in comma delimited input; '20'x (ie, space) is just another character when it's not the delimiter. What ampersand will do is addressed in this article, and more specifically, if space is the delmiiter it allows you to require two consecutive delimiters to end a field. Example:
data scores2;
infile datalines dlm=" ";
informat name $10.
team $20.
div $4.;
input name $ score1-score3 team & $ div $;
datalines;
Smith 12 22 46 Green Hornets AAA
FriedmanLi 23 19 25 High Volts AAA
Jones 09 17 54 Las Vegas AA
;
run;
Note the double space after all of the team names - that's required by the &. But this is only because delimiter is space (which is default, so if you removed the dlm=' ' it would also be needed.)
In the following program all data is read correctly
data test ;
infile datalines ;
input make 10$ mpg ## ; /* should I use make : 10$ . . */
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
run ;
proc print ;
run ;
The above code works fine, however my teacher says that I must use colon : and the correct answer is input make : 10$ mpg ## ;
I dont understand why . As far as I know : is useful if we have trailing spaces at the begining of a record line , otherwise why should we use it here ?
The colon tells SAS to use the following informat. Without the colon SAS would ignore that part (it doesn't do anything). SAS by default uses an informat (and resultant length) of $8. if you don't specify it otherwise.
You are always better off specifying the informat, as a character of 2 length stored in the default 8 length character variable would be wasting storage space and processing time, but it won't alter the value (assuming you know to be aware of the trailing spaces).
You can also specify the informat ahead of time:
data test;
infile datalines;
informat make $10.;
input
make $ mpg ##;
datalines;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;;;;
run;
proc print data=test;
run ;
I find that usually easier to read, although using :$10. in stream is acceptable as well.
The : modifier on the INPUT statement says to read the text using LIST MODE even when there is an in-line format specification. In list mode the input statement reads the next word on the line into the variable. Without the : modifier the INPUT statement FORMATTED MDOE will be used which will read exactly the number of character specified by the in-line informat. Even if this could cause it to stop before the end of the current word on the line or read past the delimiter into the next word on the line.
But there are other problems with your INPUT statement. As currently written it will generate an error.
2112 input make 10$ mpg ## ;
-
22
ERROR 22-322: Expecting a name.
The number 10 in the INPUT statement is taken to mean you want to read MAKE using COLUMN MODE input. So you want to read the single digit number from the 10th character on the line. Then the $ modifier after the column number is generating an error because there is no variable directly in front of it to modify. If you want specify an informat you need to include the period as part of the specification. If you want to specify a character informat instead of a numeric informat then the name of the informat should start with the $ character.
So your INPUT statement with an in-line informat specification for MAKE would look like:
input make :$10. mpg ## ;
The other way to make sure the MAKE is defined long enough to hold 10 characters is to define the variable before referencing it in the INPUT statement. Then SAS does not have to guess how you want it defined by how you are using it in the INPUT statement. Once the variable is known there is no need to include any extra characters in the INPUT statement.
data test ;
length make $10 mpg 8;
input make mpg ## ;
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
I am reading a period '.' as a character variable's value but it is reading it as a blank value.
data output1;
input #1 a $1. #2 b $1. #3 c $1.;
datalines;
!..
1.3
;
run;
Output Required
------ --------
A B C A B C
! ! . .
1 3 1 . 3
Please help me in reading a period as such.
The output is determined by the informat used ($w. informat in your case, requested by $1. in your code, so $1. is first of all informat definition, lenght definition of variable is a side product of this).
Use $char. informat for desired result.
data output1;
input #1 a $char1. #2 b $char1. #3 c $char1.;
datalines;
!..
1.3
;
run;
From documentation:
$w Informat
The $w. informat trims leading blanks and left aligns the values before storing the text. In addition, if a field contains only blanks and a single period, $w. converts the period to a blank because it interprets the period as a missing value. The $w. informat treats two or more periods in a field as character data.
$CHARw. informat
The $CHARw. informat does not trim leading and trailing blanks or convert a single period in the input data field to a blank before storing values.
I don't immediately see why it does not work.
But if you are not interested in figuring out why it does not work, but just want something that does: read it in as 1 variable of length $3. Then in a next step; split it using substr.
E.g.,
data output1;
length tmp $3;
input tmp;
datalines;
!..
1.3
;
run;
data output2 (drop=tmp);
length a $1;
length b $1;
length c $1;
set output1;
a=substr(tmp,1,1);
b=substr(tmp,2,1);
c=substr(tmp,3,1);
run;