SAS: Replacing Non Numeric characters in SQL table - replace

I have a table that looks similar to this:
A | B
1234|A1B2C
1124|$1n7
1342|*6675
1189|966
I need to create a column C where it takes the data from column B and replaces all non numeric characters with a "9" and makes each one 5 characters long by adding 0's to the front. It should come out like this:
91929
09197
96675
00966
Any assistance would be much appreciated, Thank you!
Edit: Sorry first time posting on any forum like this and got a bit ahead of myself, I created the table using SQL to pull data from 3 other tables and am a bit more familiar with SQL than SAS, which I have only been using for a few weeks. I have tried using COMPRESS but as I read more about that it seem like it only removes the values, so I tried TRANWRD but from what I was able to figure out I would need to create an entry for each letter and symbol that could appear, ie.
data Work.temp;
str = b;
Alpha=tranwrd(str, "a", "9");
Alpha=tranwrd(str, "b", "9");
put Alpha;
run;
so then I researched some more and found SAS replace character in ALL columns
based on that I used this code:
data temp;
set work.temp;
array vars [*] _character_;
do i = 1 to dim(vars);
vars[i] = compress(tranwrd(vars[i],"a","9"));
end;
drop i;
run;
That just returns:
|Str|B|Alpha|
|---.|-.|.-------|
(sorry about the bad formatting there, spent 30 min trying to figure out how to make the table look right with spaces but kept coming out wrong. Please imagine the -'s are spaces)
again any help would be appreciated, Thank you!

try this.
data test;
input var1 $5.;
datalines;
A1B2C
$1n7
*6675
966
;
run;
data test1;
set test;
length var2 $5.;
regex = prxparse ("s/[^0-9|\s]/9/"); /*holds the regular expression you want to use to substitute the non-number characters*/
var2 = prxchange (regex, -1, var1); /*use this function to substitute all instances of the pattern*/
var3 = put (input (var2, best5.), z5.); /*use input and put to pad the front of the variable with 0s*/
run;
Good luck.

Keeping only the digits is simple. Use the modifiers on the COMPRESS() function.
c=compress(b,,'kd');
Padding on the left with zeros there are a number of ways to do that.
You could convert the digits to a number then write it back to a string use the Z format.
c=put(input(c,??5.),Z5.);
You could add the zeros. Using IF statement:
if length(c) < 5 then c=repeat('0',5-length(c)-1)||c ;
Or using SUBSTRN() function.
c=substrn('00000',1,5-length(c))||c;
Or have some fun with the REVERSE() function.
c=reverse(substr(reverse(cats('00000',c)),1,5));

Related

Fuzzy matching on keyword in a larger string - SAS

Using SAS, I have a table with sentences and I am looking to find the rows in the table where the keyword is found in the sentence making use of fuzzy matching (complev function). Is there a way in SAS to find the keyword string in the sentences? I know how to use complev, but I only can use it to compare complete strings, not a string as a part of a larger string. For this example table the keyword would be 'example' and the result of the comparison would be in the column Result.
Thanks for your ideas!
This is an Example sentence : 1
Here is another one : 0
Also an exmple : 1
The examples keep coming : 1
No worries : 0
See if you can use this as a template. I compare the Complev value to three, but you can set it to any fitting value.
data have;
input string $ 1-25;
datalines;
Example sentence
Here is another one
Also an exmple
The examples keep coming
No worries
;
data want;
set have;
result = 0;
do _N_ = 1 to countw(string);
if complev('example', scan(string, _N_)) < 3 then do;
result=1; leave;
end;
end;
run;
EDIT: Use complev('example', scan(string, _N_), 'i') if you want the comparison the be case insensitive.

Pull Strings Before and After Key Words

Not sure if this possible in SAS; although I'm slowly learning pretty much anything is possible in SAS...
I have a data-set of 600 patients and within that data-set I have a comment variable. The comment variable contains a few sentences each patient stated about his/her care. So for example, the data set looks like this:
ID Comment
1 Today we have great service. everyone was really nice.
2 The customer service team did not know what they were talking about and was rude.
3 Everyone was very helpful 5 stars.
4 Not very helpful at all.
5 Staff was nice.
6 All the people was really nice.
Lets say I identify a number of key words I'm interested in; for example nice, rude and helpful.
Is there a way to pull 2 strings that come before these words and produce a frequency table?
WORD Frequency
Was Really Nice 2
And Was Rude 1
Was Very Helpful 1
Not very helpful 1
I have a code written already which will help me to identify the key words, this code creates a count of the freq of each word within the comment variable.
data PG_2 / view=PG_2;
length word $20;
set PG_1;
do i = 1 by 1 until(missing(word));
word = upcase(scan(COMMENT, i));
if not missing(word) then output;
end;
keep word;
run;
proc freq data=PG_2 order=freq;
table word / out=wordfreq(drop=percent);
run;
Have you looked at the perl regular expression (PRX) functions in SAS. I think they might solve your issue.
You can use RegEx capture groups to pull out the two words directly before your keyword using prxparse and prxposn. The below should grab any two words before the word nice in the comment variable and add them to the firstTwoStrings variable.
data firstTwoStrings;
length firstTwoStrings $200;
retain re;
if _N_ = 1 then
re = prxparse('/(\w+ \w+) nice/'); /*change 'nice' to your desired keyword*/
set comments;
if prxmatch(re, COMMENT) then
do;
firstTwoStrings = prxposn(re, 1, COMMENT);
end;
run;

Rearrange text on SAS

I can not find the way to reverse text strings.
For example I want to reverse these:
MMMM121231M34 to become 43M132121MMMM
MM1M11M1 to become 1M11M1MM
1111213111 to become 1113121111
Judging from your examples, what you mean by 'rearrange' is actually 'reverse'.
In that case, you've got the very handy reverse() function in SAS.
Used in context:
data test;
length text $32;
infile datalines;
input text $;
result=reverse(strip(text));
datalines;
MMMM121231M34
MM1M11M1
1111213111
;
run;
EDIT on #Joe's request: in the particular example above, I create the test dataset by setting a length of 32 characters for the text variable. Therefore, when reading the values from datalines, these are padded with blanks up to that total of 32 characters. Hence, when reversing that value, the result has that many blanks at the start, followed by the actual value you are looking for. By adding the strip function, you remove the excess blanks from the value of text before reversing, keeping only the "real" value in the result.

split single variable value in two

i have dataset a
data q7;
input trt$;
cards;
a150
b250
c300
400
abc180
;
run;
We have to create dataset b like this
trt dose
a150 150mg
b250 250mg
c300 300mg
400 400mg
abc180 180mg
new dose variable is added & mg is written after each
numeric values
here is my solution - Basically use the compress functions to keep (hence the 'k') only numbers from the trt variable. From there then is just the case of concatenating mg to numbers.
data want;
set q7;
dose = cats(compress(trt,'0123456789','k'),'mg');
run;
The compress function default behaviour is to return a character string with specified characters removed from the original string.
so
compress(trt,'0123456789') would have removed all numbers from the trt variable.
However compress comes with a battery of modifiers that let the user alter the default behaviour.
So in your case, we wanted to keep numbers regardless of the number of preceding letters so I used the modifier k to keep instead the list of characters in this case 012345679
For a full list of modifiers please read the following link
cats is one of the many functions SAS have to concatenate strings, so passing the compress argument as 1st string and mg as 2nd string will concatenate both to produce your desired result
hope it helps

SAS print ASCII value of special character

I am using the notalnum function in SAS. The input is a db field. Now, the function is returning a value that tells me there is a special character at the end of every string.
It is not a space character, because I have used COMPRESS function on the input field.
How can I print the ACII value of the special character at the end of each string?
The $HEX. format is the easiest way to see what they are:
data have;
var="Something With A Special Char"||'0D'x;
run;
data _null_;
set have;
rul=repeat('1 2 3 4 5 6 7 8 9 0 ',3); *so we can easily see what char is what;
put rul=;
put var= $HEX.;
run;
You can also use the c option on compress (var=compress(var,,'c');) to compress out control characters (which are often the ones you're going to run into in these situations).
Finally - 'A0'x is a good one to add to the list, the non-breaking space, if your data comes from the web.
If you want to see the position of the character within the ascii table you can use the rank() function, e.g.:
data _null_;
string = 'abc123';
do i = 1 to length(string);
asc = rank(substr(string,i,1));
put i= asc=;
end;
run;
Gives:
i=1 asc=97
i=2 asc=98
i=3 asc=99
i=4 asc=49
i=5 asc=50
i=6 asc=51
Joe's solution is very elegant, but seeing as my hex->decimal conversion skills are pretty poor I tend to do it this way.