Substring the last digits of a numeric variable in SAS - sas

I have a numeric variable in SAS and I am struggling to extract the last digits of it. I tried using substr but it only handles char variables. The variable I have sometimes has 3 or 4 digits.
Example
1234
237
754
9000
In these cases I need to extract
34
37
54
00
And store them as a new numeric variable. I tried the code bellow in a proc sql statement but it returns and error. Can someone help me?
Var2 = input(substr(put(var1), 1, length(put(var1))-1), 8.)

SUBSTRN() works with numeric variables but it doesn't work well in this case because there's no easy way to specify the last two characters only. The MOD() function works well in this case, because you're essentially finding the remainder of 100. Since it looks like it's a character you want, you need to use PUT() to convert it to a character as well, with the Z2 format to keep the 0 and leading zeroes.
want = put(mod(value, 100), z2.);

one more way is to use prxchange as shown below
data have;
input val1;
val2= input(prxchange('s/^(.*)(\d{2})$/$2/', -1, trim(put(val1,8.))),best.);
format val2 z2.;
datalines;
1234
237
754
9000
40
;

You can also use the reverse and substr functions as follow and then use the z2. format.
input(reverse(substr(reverse(strip(put(var1,best.))), 1,2)), 8.);

Related

Joining two data sets on a variable with different character length

I'm trying to join two data sets on a variable with different character lengths with the following code, but neither works and I'm not sure why.
FROM A AS ROLLACT
LEFT JOIN MALT.CUST AS ACCOUNT
/* ON (ROLLACT.ACCTNO, BEST.) = INPUT( ACCOUNT.ACCT_NO,BEST.) */
ON INPUT (ROLLACT.ACCTNO, 30.) = INPUT( ACCOUNT.ACCT_NO,30.)
In this case ROLLACT.ACCTNO is a character variable with length 30 and ACCT_NO is a character variable with length 19.
So I'm confused why I can't convert both to a specific length (using Input(30.)) with:
ON INPUT (ROLLACT.ACCTNO, 30.) = INPUT( ACCOUNT.ACCT_NO,30.)
I'm also trying to convert both into numeric with:
ON (ROLLACT.ACCTNO, BEST.) = INPUT( ACCOUNT.ACCT_NO,BEST.)
Does anyone have suggestions about how to do this within the Proc Sql step?
You do not need to do anything special to compare character strings of different lengths. SAS will ignore the trailing spaces. Obviously if the actual value of the longer variable has more than 19 characters it will never match the value that is limited to 19 characters.
The INPUT() function does not change the length. If is used to convert strings into values. If you use a numeric informat, as in your examples, then the result is a number. But you cannot convert a 30 digit string exactly into a number. SAS stores numbers as 8 byte floating point values so the maximum number of decimal digits of precision is 15.
a simple substr does the trick : ON (SUBSTR(ROLLACT.ACCTNO, 1,19)) = ACCOUNT.ACCT_NO

Formating character 'decimals' (comma delimiter) AND character 'integers' to numeric 'decimals' (point delimiter)

This is somewhat related to my other question recently.
Setup I am reading in character variables of the sort 1 or 2,0 or 10,0 or 2,5. I want to convert them to numerics using a decimal point instead of a comma.
So ideally I would like to get the following result:
1 -> 1
2,0 -> 2
10,0 -> 10
2,5 -> 2.5
My code
data _null_;
test='5,0';
result=input(test_point,comma10.1);
put 'this should be:' result;
run;
does this for all character variables which are of the type 'xy,z' but fails for 'xy' with no comma separation at all. Here I would get
xy -> x,y
I was thinking to add an if/else to check whether the character string has length of 1 or bigger. So something like
data _null_;
test='5';
if length(test)=1 then result=input(test, comma10.);
else result=input(test, comma10.1);
put 'this should be:' result;
run;
But the problem here would be that
10 -> 1
Problems with like 10,00 (which is supposed to be 10) becoming 100 could probably be resolved by substituting the ',' with '.', but the characters with no decimal delimiter remain a problem.
Is there any clever solution to this?
My solution which is a bit hacky (and basically only uses the fact that the comma introduces a length>2 - problems with e.g. 123 would still arise):
data _null_;
t='5,5';
test=tranwrd(t, ',', '.');
if length(test)=1 or length(test)=2 then result=input(test, comma10.);
else result=input(test, comma10.1);
put 'this should be:' result;
run;
Sounds like your text strings were created in a place where the normal meaning of comma and period in numbers is reversed. So instead of using a period for decimal point and comma for thousand grouping they have reversed the meaning.
For that type of strings SAS has the COMMAX informat.
Normally you do NOT want to add a decimal specification to your informat. The decimal part of the informat is only used when the source string does not have a explicit decimal point. Basically it is telling SAS to divide values without an explicit decimal point by 10 to the power of the number of decimal places in the informat specification. It is designed to read data where the decimal point was purposely not written in order to save space.
Pretty much all the COMMA informat does is strip the string of commas and dollar signs and then read it using the normal numeric informat.
The COMMAX informat is the one that will understand the reversed meaning of the commas and periods. So it pretty much eliminates the periods and then converts the commas to periods and then reads it using the normal numeric informat.
Try a little test of your own.
data check;
input #1 string $32. #1 num ??32. #1 comma ??comma32. #1 commax ??commax32.
#1 d2num ??32.2 #1 d2comma ??comma32.2 #1 d2commax ??commax32.2
;
cards;
123
123.4
123,4
1,234.5
1.234,5
;
proc print;
run;
As it turns out (found it here) the COMMAXw,d does the trick without any hassle, the code then would be:
data _null_;
test='0,5';
result = input(test, COMMAX10.);
put 'this should be:' result;
run;
I find it a bit anti-intuitive, but it works.

split single variable value in two

i have dataset a
data q7;
input trt$;
cards;
a150
b250
c300
400
abc180
;
run;
We have to create dataset b like this
trt dose
a150 150mg
b250 250mg
c300 300mg
400 400mg
abc180 180mg
new dose variable is added & mg is written after each
numeric values
here is my solution - Basically use the compress functions to keep (hence the 'k') only numbers from the trt variable. From there then is just the case of concatenating mg to numbers.
data want;
set q7;
dose = cats(compress(trt,'0123456789','k'),'mg');
run;
The compress function default behaviour is to return a character string with specified characters removed from the original string.
so
compress(trt,'0123456789') would have removed all numbers from the trt variable.
However compress comes with a battery of modifiers that let the user alter the default behaviour.
So in your case, we wanted to keep numbers regardless of the number of preceding letters so I used the modifier k to keep instead the list of characters in this case 012345679
For a full list of modifiers please read the following link
cats is one of the many functions SAS have to concatenate strings, so passing the compress argument as 1st string and mg as 2nd string will concatenate both to produce your desired result
hope it helps

need to substring out numeric data from a character field in SAS

I have a data set with 2 variables: a subject id number and a result. The result is a character variable. It was read in from an excel spreadsheet. Most results are numbers, but some of the results have a letter after them which was serving as a footnote in the excel file. I need to get rid of the letters after the numbers so I can convert the data to numeric for analysis. How can I do this? Below is some code to create an example dataset of the structure that I'm talking about.
data test;
input id result $ ;
datalines;
1 13
2 15
3 20
4 25c
5 75
6 99c
7 89b
8 10a
9 100
10 67
;
run;
Have a look at the compress and input functions.
num = input(compress(result, , "dk"), best.);
input converts character to numeric, interpreting the data using the informat you provide (best. here).
compress can be used to strip certain characters from a string, here it is used with the d modifier to request that all numeric digits be excluded, and the k modifier to request that the selected characters be kept rather than removed.
You may have to tweak the compress arguments a bit to deal with more complicated cases such as decimal points.

SAS Converting Characters/Number to Numbers

I am looking for a way to convert the characters into numbers in SAS so that I can use the max function. Also, it would be helpful if the characters and only the numbers are kept. Below is a list of data for a column in a SAS table.
Column UNK
abc20140714
abc20140714x
abc20140714xyz
123_abc20140714_xyz
abc20150718
After stripping out the number values from the column, I would then group the data and use the max function in SAS, which should only generate the value 20150718.
To avoid any confusion, my question, is there a way to strip out the non-numeric values, and then convert the column into a numeric column so I can use the max function?
Thanks.
Sure!
var_num = input(compress(var_char,,'kd'),yymmdd8.);
Compress removes or keeps characters from a list. 'kd' says to 'keep digits'.
You then input using the appropriate informat; yymmdd8. looks right based on the data you provide. Then apply a format, format var_num yymmdd8n.; or similar, so it looks like a date visually (even if it's really a number underneath).
As pointed out, this won't work if there are other numeric digits in the values; you need to look at your data and identify how those appear and clean them out separately. You could use a regular expression for example to identify things that have 8 consecutive digits, starting with a 20; but ultimately it is a data analysis issue to handle these as your data require.
To get the first sequence of 8 digits in a row starting with a 1 or a 2 as a numeric value, you can use the following:
data want;
set have;
pos = prxmatch("/[12]\d{7}/", character_string);
if pos > 0 then number = input(substr(character_string, pos, 8), 8.);
else number = .;
drop pos;
run;
The prxmatch expression finds the starting position of the sequence, and the substr expression extracts the sequence, then the input function converts it to a numeric.
(Edited to incorporate Joe's feedback)