Joining two data sets on a variable with different character length - sas

I'm trying to join two data sets on a variable with different character lengths with the following code, but neither works and I'm not sure why.
FROM A AS ROLLACT
LEFT JOIN MALT.CUST AS ACCOUNT
/* ON (ROLLACT.ACCTNO, BEST.) = INPUT( ACCOUNT.ACCT_NO,BEST.) */
ON INPUT (ROLLACT.ACCTNO, 30.) = INPUT( ACCOUNT.ACCT_NO,30.)
In this case ROLLACT.ACCTNO is a character variable with length 30 and ACCT_NO is a character variable with length 19.
So I'm confused why I can't convert both to a specific length (using Input(30.)) with:
ON INPUT (ROLLACT.ACCTNO, 30.) = INPUT( ACCOUNT.ACCT_NO,30.)
I'm also trying to convert both into numeric with:
ON (ROLLACT.ACCTNO, BEST.) = INPUT( ACCOUNT.ACCT_NO,BEST.)
Does anyone have suggestions about how to do this within the Proc Sql step?

You do not need to do anything special to compare character strings of different lengths. SAS will ignore the trailing spaces. Obviously if the actual value of the longer variable has more than 19 characters it will never match the value that is limited to 19 characters.
The INPUT() function does not change the length. If is used to convert strings into values. If you use a numeric informat, as in your examples, then the result is a number. But you cannot convert a 30 digit string exactly into a number. SAS stores numbers as 8 byte floating point values so the maximum number of decimal digits of precision is 15.

a simple substr does the trick : ON (SUBSTR(ROLLACT.ACCTNO, 1,19)) = ACCOUNT.ACCT_NO

Related

What does the format '(a,2(2x,3i4))' mean?

I am working with some code in Fortran where the format of writing out to some file is '(a,2(2x,3i4))'.
How do you break each part of this down to understand what it means?
a = character string of the length in the value
2( = group repeat of what's in the parentheses two times
2x = skip two columns
3i4 = 4-digit integer, left padded with blanks if needed, repeated three times

Substring the last digits of a numeric variable in SAS

I have a numeric variable in SAS and I am struggling to extract the last digits of it. I tried using substr but it only handles char variables. The variable I have sometimes has 3 or 4 digits.
Example
1234
237
754
9000
In these cases I need to extract
34
37
54
00
And store them as a new numeric variable. I tried the code bellow in a proc sql statement but it returns and error. Can someone help me?
Var2 = input(substr(put(var1), 1, length(put(var1))-1), 8.)
SUBSTRN() works with numeric variables but it doesn't work well in this case because there's no easy way to specify the last two characters only. The MOD() function works well in this case, because you're essentially finding the remainder of 100. Since it looks like it's a character you want, you need to use PUT() to convert it to a character as well, with the Z2 format to keep the 0 and leading zeroes.
want = put(mod(value, 100), z2.);
one more way is to use prxchange as shown below
data have;
input val1;
val2= input(prxchange('s/^(.*)(\d{2})$/$2/', -1, trim(put(val1,8.))),best.);
format val2 z2.;
datalines;
1234
237
754
9000
40
;
You can also use the reverse and substr functions as follow and then use the z2. format.
input(reverse(substr(reverse(strip(put(var1,best.))), 1,2)), 8.);

In sapscript, how do I trim/offset from the right of a string?

I have a need in SAPScript to trim a string from the right. There doesn't appear to be a function to accomplish this. &myfield+3& only trims from the left.
Is there a way to trim from the right? Can offset accept a negative value?
My ultimate goal was to take a number such as a quantity; 12.43 and convert that to: 001243.
6 characters long
padded left with zeroes
no special characters (decimals or thousands separators)
Ultimately I had to first define a field and do the intial number formatting:
/:DEFINE &myfield& = &qtyfield(.2CT)&
The above
sets the number to 2 decimal points (.2)
space compreession (C)
removes the thousands separator (T)
Then I call a function within our print routine to do the special character stripping as such:
/:PERFORM get_unformatted_value IN PROGRAM zbc_rle_ean128_label
/:USING &myfield&
/:CHANGING &myfield&
/:ENDPERFORM
Then I can do the final output as such:
/ &myfield(K6RF0)&
which:
Ignores any conversions (K)
Sets output length to 6 and right aligns it (6R)
and left pad with zeros (F0)
That seems to work for me. Hopefully this helps someone!

How to convert numbers in a character variable to Numeric in sas

Can anyone help me to resolve this?
I have a very large raw dataset with a character variable that contains text strings along with numbers & dates defined in character format. Now I want to process the dataset and create a new numeric variable and populate values only when the text in the actual variable is either a number or a date value. Otherwise missing
RAWDATA:
ACTUAL_VARIABLE NEW_NUM_VARIABLE(Expected Values)
------------------ ---------------------------------
ODed on pills threw them all up - 2006
Y
1 1
5 5
ODed on pills
6 6
Less than once a week
N
N
2006-11-12 2006-11-12
Many Thanks in Advance
The easy way to do it (if you know the specific date format) is to use the input function.
09:27
If put(input(var,??yymmdd10.),yymmdd10.)=var then its a date!
else if input(var,best.) ne . then its a number.
Otherwiseits a character string.
This isn't as straightforward as it first looks, so I understand why it would be difficult to search for an answer. Just extracting a number is pretty easy, but when dates are included it becomes a bit more complicated (particularly when the format entered could change, e.g. yyyy-mm-dd, dd-mm-yyyy, dd/mm/yy etc).
One thing to note first. If you want to store the new values as a numeric field then you can't show a mix of numbers and dates. Dates are stored as numbers and formatted to show the date, but you can't apply a format at row level. Therefore I would suggest creating 2 new columns, 1 for numbers and 1 for dates.
My preferred approach is to use the anyalpha function to exclude any records with an alphabetic character, followed by the anypunct function to identify if a punctuation character exists (this should identify dates rather than just numbers). The anydtdte informat is then used to extract the date, this is a very useful informat as it reads dates stored in different ways (as per my note above).
There are clearly some caveats with this method.
If any numbers contain decimals then my method would incorrectly treat these as dates, therefore only integers will be assigned correctly.
It won't pick up dates that contain the month as words, e.g. 15-May-2015, as the anyalpha function would exclude them. They will need to contain numbers only, separated by any punctuation character.
Here's my code.
/* create initial dataset */
data have;
input actual_variable $ 50.;
datalines;
ODed on pills threw them all up - 2006
Y
1
5
ODed on pills
6
Less than once a week
N
N
2006-11-12
;
run;
/* extract dates and numbers */
data want;
set have;
if not anyalpha(actual_variable) then do; /* exclude records with an alphabetic character */
if anypunct(actual_variable) then new_date_variable = input(actual_variable,anydtdte10.); /* if a punctuation character exists then read in as a date */
else new_num_variable = input(actual_variable,best12.); /* else read in as a number */
end;
format new_date_variable yymmdd10.; /* show date field in required format */
run;

SAS Converting Characters/Number to Numbers

I am looking for a way to convert the characters into numbers in SAS so that I can use the max function. Also, it would be helpful if the characters and only the numbers are kept. Below is a list of data for a column in a SAS table.
Column UNK
abc20140714
abc20140714x
abc20140714xyz
123_abc20140714_xyz
abc20150718
After stripping out the number values from the column, I would then group the data and use the max function in SAS, which should only generate the value 20150718.
To avoid any confusion, my question, is there a way to strip out the non-numeric values, and then convert the column into a numeric column so I can use the max function?
Thanks.
Sure!
var_num = input(compress(var_char,,'kd'),yymmdd8.);
Compress removes or keeps characters from a list. 'kd' says to 'keep digits'.
You then input using the appropriate informat; yymmdd8. looks right based on the data you provide. Then apply a format, format var_num yymmdd8n.; or similar, so it looks like a date visually (even if it's really a number underneath).
As pointed out, this won't work if there are other numeric digits in the values; you need to look at your data and identify how those appear and clean them out separately. You could use a regular expression for example to identify things that have 8 consecutive digits, starting with a 20; but ultimately it is a data analysis issue to handle these as your data require.
To get the first sequence of 8 digits in a row starting with a 1 or a 2 as a numeric value, you can use the following:
data want;
set have;
pos = prxmatch("/[12]\d{7}/", character_string);
if pos > 0 then number = input(substr(character_string, pos, 8), 8.);
else number = .;
drop pos;
run;
The prxmatch expression finds the starting position of the sequence, and the substr expression extracts the sequence, then the input function converts it to a numeric.
(Edited to incorporate Joe's feedback)