SAS search and replace the last word using regex [duplicate] - regex

Match strings ending in certain character
I am trying to get create a new variable which indicates if a string ends with a certain character.
Below is what I have tried, but when this code is run, the variable ending_in_e is all zeros. I would expect that names like "Alice" and "Jane" would be matched by the code below, but they are not:
proc sql;
select *,
case
when prxmatch("/e$/",name) then 1
else 0
end as ending_in_e
from sashelp.class
;quit;

You should account for the fact that, in SAS, strings are of char type and spaces are added up to the string end if the actual value is shorter than the buffer.
Either trim the string:
prxmatch("/e$/",trim(name))
Or add a whitespace pattern:
prxmatch("/e\s*$/",name)
^^^
to match 0 or more whitespaces.

SAS character variables are fixed length. So you either need to trim the trailing spaces or include them in your regular expression.
Regular expressions are powerful, but they might be confusing to some. For such a simple pattern it might be clearer to use simpler functions.
proc print data=sashelp.class ;
where char(name,length(name))='e';
run;

Related

regular expression replace for SQL

I have to replace a string pattern in SQL with empty string, could anyone please suggest me?
Input String 'AC001,AD001,AE001,SA001,AE002,SD001'
Output String 'AE001,AE002
There are the 4 digit codes with first 2 characters "alphabets" and last two are digits. This is always a 4 digit code. And I have to replace all codes except the codes starting with "AE".
I can have 0 or more instances of "AE" codes in the string. The final output should be a formatted string "separated by commas" for multiple "AE" codes as mentioned above.
Here is one option calling regex_replace multiple times, eliminating the "not required" strings little by little in each iteration to arrive at the required output.
SELECT regexp_replace(
regexp_replace(
regexp_replace(
'AC001,AD001,AE001,SA001,AE002,SD001', '(?<!AE)\d{3},{0,1}', 'X','g'
),'..X','','g'
),',$','','g'
)
See Demo here
I would convert the list to an array, unnest that to rows then filter out those that should be kept and aggregate it back to a string:
select string_agg(t, ',')
from unnest(string_to_array('AC001,AD001,AE001,SA001,AE002,SD001',',') as x(t)
where x.t like 'AE%'; --<< only keep those
This is independent of the number of elements in the string and can easily be extended to support more complex conditions.
This is a good example why storing comma separated values in a single column is not such a good idea to begin with.

sas, remove the comma and period, regex

Do you guys know how to replace remove the comma and period in something like this:
'18430109646000104331929350001,064380958490001,974317618110001,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,. '
I had to concatenate to get list of claim numbers (with leading zeros). Now, I have that string but I want to delete all the stuff at the end. I tried this but it didn't work
data OUT.REQ_1_4_25 ;
set OUT.REQ_1_4_24;
CONCAT1=PRXCHANGE('s/,.//',1,CONCAT);
run;
By the way, I am using SAS and regex, something like prxchange.
This also worked for me
data OUT.REQ_1_4_25 ;
set OUT.REQ_1_4_24;
CONCAT1=TRANWRD(CONCAT, ',.', '');
run;
The second argument to the PRXCHANGE function specifies the number of times the search and replace should be done. Replacing your 1 by -1 will run the replacement until the end of the string, rather than only once.
Also, the pair ',.' will replace a comma followed by any character ('.' is a wildcard). You want to catch either a comma (',') or a period ('.'), the last of which is a metacharacter you need to escape from, using '\':
CONCAT1=PRXCHANGE('s/[,\.]//',-1,CONCAT);
If you only want to remove the comma-period pairs, then remove the square brackets:
CONCAT1=PRXCHANGE('s/,\.//',-1,CONCAT);
No need for regex unless you have something more complicated than actually shown.
Just use the scan() function and tell it to use . and , as delimiters:
data claims;
length claim $50;
list = '18430109646000104331929350001,064380958490001,974317618110001,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.';
cnt=1;
claim=scan(list,cnt,'.,');
do while (claim ne '');
output;
cnt=cnt+1;
claim=scan(list,cnt,'.,');
end;
keep claim;
run;

SAS - replacing a character with a space?

Had a quick question - I need to remove punctuation and replace characters with a space (i.e.: if I have a field that contains a * I need to replace it with a white space).
I can't seem to get it right - I was originally doing this to just remove it, but I've found that in some cases my string is being squished together.
Thoughts?
STRING2 = compress(STRING, ":,*~’°-!';()®""##$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…...");
The COMPRESS() function will remove the characters. If you want to replace them with spaces then use the TRANSLATE() function. If you want to reduce multiple blanks to a single blank use the COMPBL() function.
STRING2 = compbl(translate(STRING,' ',":,*~’°-!';()®""##$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…..."));
Rather than listing the characters that need to be converted to spaces you could use COMPRESS() to turn the problem around to listing the characters that should be kept.
So this example will use the modifiers ad on the COMPRESS() function call to pass the characters in STRING that are not alphanumeric characters to the TRANSLATE() function call so they will be replaced by spaces.
STRING2 = compbl(translate(STRING,' ',compress(STRING,' ','ad')));
Try using the translate function and see if it fits your needs:
data want;
STRING = "!';AAAAÄAA$";
STRING2 = translate(STRING,' ',':;,*~''’°-!()®#""#$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…...');
run;
Output:
STRING STRING2
!';AAAAÄAA$ AAAA AA
Try the TRANSLATE() function.
TRANSLATE(SOURCE,TO,FROM);
data test;
string = "1:,*2~’°-ÍÓ3Äö‘—È…...4";
string2 = translate(string,
" ",
":,*~’°-!';()®""##$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…...");
put string2=;
run;
I get
string2=1 2 3 4
While translate function could get you there, you could also use REGEX in SAS. It is more elegant, but you need to escape the characters in the actual regex pattern.
data want;
input string $60.;
length new_string $60.;
new_string = prxchange('s/([\:\,\*\~\’\°\-\!\'||"\'"||';\(\)\®\"\"\#\#\$\%\^\&\©\+\=\\\/\|\[\}\{\]\{\\\?\>\<\É\Ñ\Ë\Á\’\Í\Ó\Ä\ö\‘\—\È\…\.\.\.\]])/ /',-1,string);
datalines;
Cats, dogs, and anyone else!
;
Try it with the help of regular expressions.
data have;
old = "AM;'IGH}|GH";
new = prxchange("s/[^A-Z]/ /",-1,old);
run;
proc print data=have nobs;
run;
OUTPUT-
old new
AM;'IGH}|GH AM IGH GH

One function to replace different text with other in SAS

I want to replace one combination of text with another. For example
data test;
a='raja\ram{work}italic';
if index(a,'\') then b=tranwrd(a,'\','\\');
if index(a,'{') then b=tranwrd(a,'{','\{');
if index(a,'}') then b=tranwrd(a,'}','\}');
if index(upcase(a),'ITALIC') then b=tranwrd(a,substr(a,index(upcase(a),'ITALIC'),length('ITALIC')),'\i');
run;
Required Result: b=raja\\ram\{work\}\i;
These kind of combination I wanted to replace. I'm not interested to use a macro or FCMP or if else condition.
Is there any function to do all at once? I tried to use a Perl expression that also working for one at a time b= prxchange('s/\\/\\\\/', -1, a)
Your regular expression is on the right track. You have a set of characters, right, you want to always prepend a \ to? So search for (one of that set of characters), which you do with [...], and then add a \ to it, using a capturing group. That's the escape character, so you have to add two any time you want to use one (\\ escapes itself to \).
data test;
a='Hello\Goodbye{stuff}';
b= prxchange('s/([\\{}])/\\$1/',-1,a);
put b=;
run;
You should do the italic bit in a second expression (or just use tranwrd). That's a totally different replacement and while theoretically possible to put in one, would make it too messy.
This question is almost identical to the other question: Multiple search and replace within a string through regular expression in SAS
Is that a coincidence?
Here is the code that worked for the other question.
%let text = abc\pqr{work};
data _null_;
var=prxchange("s/\\/\\\\/",-1,"&text");
var=prxchange("s/\{/\\\{/",-1,var);
var=prxchange("s/\}/\\\}/",-1,var);
put var;
run;
Result: abc\\pqr\{work\};
%let text = BOLD\ITALIC\ITALICBOLD\BOLDITALIC\B\I\IB\BI;
data _null_;
var=prxchange("s/BOLD/b/",-1,"&text");
var=prxchange("s/ITALIC/i/",-1,var);
var=lowcase(var);
put var;
run;
RESULT: b\i\ib\bi\b\i\ib\bi

SAS Like Operator: can underscore "_" match a single space?

I know that the underscore wildcard can be used to match a single character, but I'm not sure if it can be used to match a single space? Can anybody help answer this? Thanks a lot.
Yes.
data test;
x=' ';
run;
proc sql;
select count(1) from test
where x like "_";
quit;
returns 1.
I think you can match a single space as a character as long as you specify the whole length of the character variable. If you want to use the _, when you specify the like operator you would have to include the full details of the length of character.
E.g. If I want to select "black wolf":
data work.animals;
input name $1-16 weight;
datalines;
monkey 20
shark 500
blue whale 200
black wolf 120
buffalo 400
;
data work.animals3;
set animals;
where name like 'black_wol_';
run;
I can use like 'black_wol_'; which includes the full matching pattern of the character inside the character variable. But, I can't just do like 'black_' or like 'black_wol';. It won't work because the number of characters in the string are different.
Alternatively, you can use the % sign which can specify any number of characters before, after or in the middle of a string. E.g. where name like '%e'; or where name like 'blu%e'; can select "blue whale". You can use both _ and % together.