Where to specify input # in SAS? - sas

In the following code
data temp2;
input id 1 #3 date mmddyy11.;
cards;
1 11/12/1980
2 10/20/1996
3 12/21/1999
;
run;
what do 1 #3 symbols mean ? i presume 1 means that id is the first character in the data . I know that #3 means that date variable starts with the third character , but why is it in front of date whereas 1 is after id?

Because that's a badly written input statement. You can specify input in a number of ways, and that mixes a few different ways to do things which happen to be allowed to mix (mostly). Read the SAS documentation on input for more information.
Some common styles that you can use:
input #1 id $5.; *Formatted input. Allows specification of start position and informat, more useful if using date or other informat that is not just normal character/number.;
input id str $ otherstr $ date :date9.; *List input. This is for delimited text (like a CSV), still lets you specify informat.
input #'ID:' id $5.; *A special case of formatted input. allows you to parse files that include the variable name, useful for old style files and some xml/json/etc. type files;
input x 1-10 y 11-20; *Column input. Not used very commonly as it's less flexible than start/informat style.;
There are other options (such as named input) that are not very frequently used in my experience.
In your specific example, the first variable is read in with column input [id 1 says 'read a 1 character numeric from position 1 into id'] and then the second variable is read with formatted input [#3 date mmddyy11. says 'Read an 11 character date variable from position 3[-13] into a numeric using the date informat to translate it to a number.'] It also says someone gave you that code who isn't very familiar with SAS, since mmddyy10. is the correct informat - the 11th character cannot be helpful.

Related

Why is not $ needed between the names of two last columns

I can't get why there isn't $ between Age and Weight like there is for others
enter image description here
And Why the missing value is denoted by blank in row 2 but period(.) in row 4
enter image description here
Please help with such questions, I'm new to SAS and it's so miserable when I asked my colleagues, and nobody was willing to answer...
The $ after name and gender indicates that those variables are character variables. The other variables are numeric variables. Those are the only 2 data types in SAS.
Missing values for character variables are blanks while missing values for numeric variables are shown with a period.
The $ modifier in a INPUT statement let's SAS know that it should treat the preceding variable as CHARACTER instead of numeric. It is only needed when you have not previously already told SAS that the variable is character, because the default assumption is that a variable is numeric.
The normal character informat, $, which is what your code used because it did not specify any other informat for NAME or SEX, will convert a single period into blanks, which is how SAS stores an empty character value. If you want SAS to input a single period as an actual single period you would need to use the $CHAR informat instead. So if you modified the input statement like this:
input name $ sex :$char1. age weight ;
then the value of SEX would be the one byte string '.' (since the width of the informat gave SAS better information about what length to use to define the variable) instead of the current 8 character string of blanks ' '.
The normal way to display a missing numeric variable is as a single period. You can change to have it display some other single character instead by changing the setting of the MISSING option. For example to have the missing values of AGE print as blank also you could add this OPTIONS statement before you print it.
options missing=' ';

SAS variable value does not match the formatting specified

I have a SAS dataset. When I 'View Columns', I find a column with Type=text, length=3, informat = $3., format=$3.
The value stored in this variable is 10.
But based on the attributes, should it not be stored as 010?
The attributes say you have a character variable that can hold 3 bytes (normal character encodings use one byte per character). You could store '010' in that variable or '10 ' or even ' 10'. You could also store 'ABC' or 'abc'. It is just a character variable. Note that SAS always stores fixed length character fields so shorter values are padded with spaces.
It also has optionally added the FORMAT metadata saying that when displaying the value SAS should use the $3. format. Similarly is has optional metadata that says when reading text it should use the $3. informat to convert incoming text into the value to be stored.
This metadata is NOT needed because SAS already knows how to read and display character data. If you did store values with leading spaces you might want to attach the $CHAR3. format instead so that the leading spaces are preserved when writing the value.
As the variable is just text, it will just store what it is assigned. For example:
data have;
length var1 $ 3;
informat var1 $3.;
format var1 $3.;
input var1;
datalines;
10
010
;
The fact that it has a format of $3. will not cause it to be prefixed with a leading 0, as you will see from the documentation of the $w. format, where that is not mentioned. Also, the value could later be changed to 'ab'; in both cases the value is padded with a trailing space to make up the length of 3.

Rearrange text on SAS

I can not find the way to reverse text strings.
For example I want to reverse these:
MMMM121231M34 to become 43M132121MMMM
MM1M11M1 to become 1M11M1MM
1111213111 to become 1113121111
Judging from your examples, what you mean by 'rearrange' is actually 'reverse'.
In that case, you've got the very handy reverse() function in SAS.
Used in context:
data test;
length text $32;
infile datalines;
input text $;
result=reverse(strip(text));
datalines;
MMMM121231M34
MM1M11M1
1111213111
;
run;
EDIT on #Joe's request: in the particular example above, I create the test dataset by setting a length of 32 characters for the text variable. Therefore, when reading the values from datalines, these are padded with blanks up to that total of 32 characters. Hence, when reversing that value, the result has that many blanks at the start, followed by the actual value you are looking for. By adding the strip function, you remove the excess blanks from the value of text before reversing, keeping only the "real" value in the result.

How to convert numbers in a character variable to Numeric in sas

Can anyone help me to resolve this?
I have a very large raw dataset with a character variable that contains text strings along with numbers & dates defined in character format. Now I want to process the dataset and create a new numeric variable and populate values only when the text in the actual variable is either a number or a date value. Otherwise missing
RAWDATA:
ACTUAL_VARIABLE NEW_NUM_VARIABLE(Expected Values)
------------------ ---------------------------------
ODed on pills threw them all up - 2006
Y
1 1
5 5
ODed on pills
6 6
Less than once a week
N
N
2006-11-12 2006-11-12
Many Thanks in Advance
The easy way to do it (if you know the specific date format) is to use the input function.
09:27
If put(input(var,??yymmdd10.),yymmdd10.)=var then its a date!
else if input(var,best.) ne . then its a number.
Otherwiseits a character string.
This isn't as straightforward as it first looks, so I understand why it would be difficult to search for an answer. Just extracting a number is pretty easy, but when dates are included it becomes a bit more complicated (particularly when the format entered could change, e.g. yyyy-mm-dd, dd-mm-yyyy, dd/mm/yy etc).
One thing to note first. If you want to store the new values as a numeric field then you can't show a mix of numbers and dates. Dates are stored as numbers and formatted to show the date, but you can't apply a format at row level. Therefore I would suggest creating 2 new columns, 1 for numbers and 1 for dates.
My preferred approach is to use the anyalpha function to exclude any records with an alphabetic character, followed by the anypunct function to identify if a punctuation character exists (this should identify dates rather than just numbers). The anydtdte informat is then used to extract the date, this is a very useful informat as it reads dates stored in different ways (as per my note above).
There are clearly some caveats with this method.
If any numbers contain decimals then my method would incorrectly treat these as dates, therefore only integers will be assigned correctly.
It won't pick up dates that contain the month as words, e.g. 15-May-2015, as the anyalpha function would exclude them. They will need to contain numbers only, separated by any punctuation character.
Here's my code.
/* create initial dataset */
data have;
input actual_variable $ 50.;
datalines;
ODed on pills threw them all up - 2006
Y
1
5
ODed on pills
6
Less than once a week
N
N
2006-11-12
;
run;
/* extract dates and numbers */
data want;
set have;
if not anyalpha(actual_variable) then do; /* exclude records with an alphabetic character */
if anypunct(actual_variable) then new_date_variable = input(actual_variable,anydtdte10.); /* if a punctuation character exists then read in as a date */
else new_num_variable = input(actual_variable,best12.); /* else read in as a number */
end;
format new_date_variable yymmdd10.; /* show date field in required format */
run;

What does an ampersand ("&") do in a put statement?

I'm familiar with the :, and ~ modifiers in SAS put and input statements. The behaviour of & in an input statement is also fairly well documented. But what does & do in a put statement?
It seems to have a similar effect to :, triggering modified list output rather than formatted output, but I can't find any documentation of this behaviour.
E.g.
data _null_;
set sashelp.class;
file 'c:\temp\output.csv' dlm=',';
put Name Sex Age & 4. Height Weight;
run;
Quoting from the on-line documentation in the section of SAS 9.4 under INPUT Statement, List
&
indicates that a character value can have one or more single embedded blanks. This format modifier reads the value from the next non-blank column until the pointer reaches two consecutive blanks, the defined length of the variable, or the end of the input line, whichever comes first.
Restriction:
The & modifier must follow the variable name and $ sign that it affects.
Tip:
If you specify an informat after the & modifier, the terminating condition for the format modifier remains two blanks.
Here is an example from the example section:
Example Reading Character Data That Contains Embedded Blanks
The INPUT statement in this DATA step uses the & format modifier with list input to read character values that contain embedded blanks.
data list;
infile file-specification;
input name $ & score;
run;
It can read these input data records:
----+----1----+----2----+----3----+
Joseph 11 Joergensen red
Mitchel 13 Mc Allister blue
Su Ellen 14 Fischer-Simon green
The & modifier follows the variable that it affects in the INPUT statement. Because this format modifier follows NAME, at least two blanks must separate the NAME field from the SCORE field in the input data records.
You can also specify an informat with a format modifier, as shown here:
input name $ & +3 lastname & $15. team $;
In addition, this INPUT statement reads the same data to demonstrate that you are not required to read all the values in an input record. The +3 column pointer control moves the pointer past the score value in order to read the value for LASTNAME and TEAM.