SAS string function not working when program is called with %include - sas

When I run some simple code to create a variable using the strip function, it properly strips leading and trailing blanks. However, when I call the program the code is in with a %include. It doesn't work.
data aniorder;
input animalname $ seq taxonomy $20.;
datalines;
DOG 1 800
ELEPHANT 2 0
FISH 3 0
;
run;
data aniorder2;
set aniorder;
ani = strip(animalname);
keep ani seq taxonomy;
run;
When running %include, I don't get an error message, but the variable "ani" on the aniorder2 dataset still has the leading and trailing blanks. This doesn't happen when I just run the code above.
Anyone have any idea what's happening with %include here?

Variables will ALWAYS have trailing blanks. SAS character variables are fixed length and padded with blanks. But the STRIP() function call will definitely remove leading blanks, but there should not be any leading blanks given the informat used in your input statement.
Perhaps your source file has the wrong end of line characters? For example if you made the file on a PC it would normally have CR+LF at the end of each line. If you then read that file on Unix the CR would become part of the data for the line. And since your posted example as leading spaces on the data lines (why are those there?) perhaps you saved the file using TAB characters to replace some of the blank characters?
Try adjusting your program. You could try adding the / TERMSTR=CRLF option to the %INCLUDE statement and see if that eliminates the trailing "blanks". You could try adding an INFILE statement so that you can include the EXPANDTABS and see if that eliminates the leading "blanks". You could also add the TRUNCOVER option since you are reading more characters than are on the lines. Not sure if it is needed since normally in-line data is padded automatically to a multiple of 80 characters.
data aniorder;
infile datalines expandtabs truncover ;
input animalname $ seq taxonomy $20.;
datalines;
DOG 1 800
ELEPHANT 2 0
FISH 3 0
;

Related

Read in SAS with two lines end and start at different positions

I have two lines of observations to read in SAS.
It is a comma-delimited data set.
My code is as below:
DATA SASweek1.industry;
INFILE "&Dirdata.Assignment1_Q6_data.txt" DLM="," DSD termstr=crlf TRUNCOVER;
LENGTH Company $ 15;
INPUT Company $ State $ Expense COMMA9. ;
FORMAT Expense DOLLAR9.;
*INFORMAT Expense DOLLAR10.;
RUN; * not ready;
The raw data set looks like this:
I can print out the first line of observations well,
but the last "0" will go to the first position of the second
line, becoming "0Lee's..".
Any suggestions would be highly appreciated!!
It is just doing what you told it to do. You told it to read exactly 9 characters.
Normally you should not use formatted input mode with delimited data. You prevent that by either adding the : (colon) prefix in front of the informat specification in the INPUT statement or removing the informat specification completely and using an INFORMAT statement to let SAS know what informat to use.
But your data is NOT properly delimited because the last field contains the delimiter, but the value is not enclosed in quotes. So the commas make it look like two values instead of one. The real solution is to fix the process that created the file to create a valid delimited file. It needs to quote the values with commas in them, or remove the commas from the numbers, or use a delimiter character that does not appear in the data.
Fortunately since it is the last field on the line you CAN use formatted input to read just that field. Since you are using the TRUNCOVER option just set the width of the informat in the INPUT statement to the maximum.
DATA SASweek1.industry;
INFILE "&Dirdata.Assignment1_Q6_data.txt" DLM="," DSD termstr=crlf TRUNCOVER;
LENGTH Company $15 State $15 Expense 8;
INPUT Company State Expense COMMA32. ;
FORMAT Expense DOLLAR9.;
RUN;

Rearrange text on SAS

I can not find the way to reverse text strings.
For example I want to reverse these:
MMMM121231M34 to become 43M132121MMMM
MM1M11M1 to become 1M11M1MM
1111213111 to become 1113121111
Judging from your examples, what you mean by 'rearrange' is actually 'reverse'.
In that case, you've got the very handy reverse() function in SAS.
Used in context:
data test;
length text $32;
infile datalines;
input text $;
result=reverse(strip(text));
datalines;
MMMM121231M34
MM1M11M1
1111213111
;
run;
EDIT on #Joe's request: in the particular example above, I create the test dataset by setting a length of 32 characters for the text variable. Therefore, when reading the values from datalines, these are padded with blanks up to that total of 32 characters. Hence, when reversing that value, the result has that many blanks at the start, followed by the actual value you are looking for. By adding the strip function, you remove the excess blanks from the value of text before reversing, keeping only the "real" value in the result.

SAS print ASCII value of special character

I am using the notalnum function in SAS. The input is a db field. Now, the function is returning a value that tells me there is a special character at the end of every string.
It is not a space character, because I have used COMPRESS function on the input field.
How can I print the ACII value of the special character at the end of each string?
The $HEX. format is the easiest way to see what they are:
data have;
var="Something With A Special Char"||'0D'x;
run;
data _null_;
set have;
rul=repeat('1 2 3 4 5 6 7 8 9 0 ',3); *so we can easily see what char is what;
put rul=;
put var= $HEX.;
run;
You can also use the c option on compress (var=compress(var,,'c');) to compress out control characters (which are often the ones you're going to run into in these situations).
Finally - 'A0'x is a good one to add to the list, the non-breaking space, if your data comes from the web.
If you want to see the position of the character within the ascii table you can use the rank() function, e.g.:
data _null_;
string = 'abc123';
do i = 1 to length(string);
asc = rank(substr(string,i,1));
put i= asc=;
end;
run;
Gives:
i=1 asc=97
i=2 asc=98
i=3 asc=99
i=4 asc=49
i=5 asc=50
i=6 asc=51
Joe's solution is very elegant, but seeing as my hex->decimal conversion skills are pretty poor I tend to do it this way.

What does an ampersand ("&") do in a put statement?

I'm familiar with the :, and ~ modifiers in SAS put and input statements. The behaviour of & in an input statement is also fairly well documented. But what does & do in a put statement?
It seems to have a similar effect to :, triggering modified list output rather than formatted output, but I can't find any documentation of this behaviour.
E.g.
data _null_;
set sashelp.class;
file 'c:\temp\output.csv' dlm=',';
put Name Sex Age & 4. Height Weight;
run;
Quoting from the on-line documentation in the section of SAS 9.4 under INPUT Statement, List
&
indicates that a character value can have one or more single embedded blanks. This format modifier reads the value from the next non-blank column until the pointer reaches two consecutive blanks, the defined length of the variable, or the end of the input line, whichever comes first.
Restriction:
The & modifier must follow the variable name and $ sign that it affects.
Tip:
If you specify an informat after the & modifier, the terminating condition for the format modifier remains two blanks.
Here is an example from the example section:
Example Reading Character Data That Contains Embedded Blanks
The INPUT statement in this DATA step uses the & format modifier with list input to read character values that contain embedded blanks.
data list;
infile file-specification;
input name $ & score;
run;
It can read these input data records:
----+----1----+----2----+----3----+
Joseph 11 Joergensen red
Mitchel 13 Mc Allister blue
Su Ellen 14 Fischer-Simon green
The & modifier follows the variable that it affects in the INPUT statement. Because this format modifier follows NAME, at least two blanks must separate the NAME field from the SCORE field in the input data records.
You can also specify an informat with a format modifier, as shown here:
input name $ & +3 lastname & $15. team $;
In addition, this INPUT statement reads the same data to demonstrate that you are not required to read all the values in an input record. The +3 column pointer control moves the pointer past the score value in order to read the value for LASTNAME and TEAM.

Is it necessary to use ":" for string longer than 8 characters?

In the following program all data is read correctly
data test ;
infile datalines ;
input make 10$ mpg ## ; /* should I use make : 10$ . . */
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;
run ;
proc print ;
run ;
The above code works fine, however my teacher says that I must use colon : and the correct answer is input make : 10$ mpg ## ;
I dont understand why . As far as I know : is useful if we have trailing spaces at the begining of a record line , otherwise why should we use it here ?
The colon tells SAS to use the following informat. Without the colon SAS would ignore that part (it doesn't do anything). SAS by default uses an informat (and resultant length) of $8. if you don't specify it otherwise.
You are always better off specifying the informat, as a character of 2 length stored in the default 8 length character variable would be wasting storage space and processing time, but it won't alter the value (assuming you know to be aware of the trailing spaces).
You can also specify the informat ahead of time:
data test;
infile datalines;
informat make $10.;
input
make $ mpg ##;
datalines;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;;;;
run;
proc print data=test;
run ;
I find that usually easier to read, although using :$10. in stream is acceptable as well.
The : modifier on the INPUT statement says to read the text using LIST MODE even when there is an in-line format specification. In list mode the input statement reads the next word on the line into the variable. Without the : modifier the INPUT statement FORMATTED MDOE will be used which will read exactly the number of character specified by the in-line informat. Even if this could cause it to stop before the end of the current word on the line or read past the delimiter into the next word on the line.
But there are other problems with your INPUT statement. As currently written it will generate an error.
2112 input make 10$ mpg ## ;
-
22
ERROR 22-322: Expecting a name.
The number 10 in the INPUT statement is taken to mean you want to read MAKE using COLUMN MODE input. So you want to read the single digit number from the 10th character on the line. Then the $ modifier after the column number is generating an error because there is no variable directly in front of it to modify. If you want specify an informat you need to include the period as part of the specification. If you want to specify a character informat instead of a numeric informat then the name of the informat should start with the $ character.
So your INPUT statement with an in-line informat specification for MAKE would look like:
input make :$10. mpg ## ;
The other way to make sure the MAKE is defined long enough to hold 10 characters is to define the variable before referencing it in the INPUT statement. Then SAS does not have to guess how you want it defined by how you are using it in the INPUT statement. Once the variable is known there is no need to include any extra characters in the INPUT statement.
data test ;
length make $10 mpg 8;
input make mpg ## ;
datalines ;
Ford 20 Honda 29 Oldsmobile 20 Cadillac 17
Toyota 24 Chevrolet 17
;