Replacing last occurrence of character group with Oracle REGEXP_REPLACE - regex

I have strings like the following in my Oracle 11g table:
ABCDEF000xyz12345abcdefgh
GHIJK0000def67890abcdefgh
I.e., the strings begin with capital letters followed by a series of zeros, followed by three characters, digits and characters again.
How can I replace the xyz12345abcdefgh and def67890abcdefgh with a certain string using REGEXP_REPLACE in Oracle?

If you need to only select the records of the type you mentioned, consider using
select REGEXP_REPLACE(col, '^([[:upper:]]+0+)[[:alpha:]]{3}\d+[[:alpha:]]+$', '\1NEW_STRING')
where
^ - a start of string
([[:upper:]]+0+) - capturing group #1 matching:
[[:upper:]]+ - 1 or more uppercase letters
0+ - one or more 0 chars
[[:alpha:]]{3} - 3 alphabetic chars
\d+ - 1 or more digits
[[:alpha:]]+ - 1 or more alphabetic chars
$ - end of string.
The \1 in the replacement string is a backreference that inserts the value stored in the capturing group #1 buffer.
See the online demo.

select regexp_replace(column_name,'(.*)([0]{2,})(.*)','\1\2xxxx') from table_name;

Related

Regex: Replace certain part of the matched characters

I want to be able to match with a certain condition, and keep certain parts of it. For example:
JuneĀ 2021 9 Feature Article Three-Suiters Via Puppets Kai-Ching Lin
should turn into:
JunĀ 2021 Three-Suiters Via Puppets Kai-Ching Lin
So, everything until the end of the word Article should be matched; then, only the first three characters of the months is kept, as well as the year, and this part is going to replace the matched characters.
My strong regex knowledge got me as far as:
.+Article(?)
You could use 2 capture groups and use those in a replacement:
\b([A-Z][a-z]+)[a-z](\s+\d{4})\b.*?\bArticle\b
\b A word boundary to prevent a partial word match
([A-Z][a-z]+) Capture group 1, match a single uppercase char and 1+ lowercase chars
[a-z] Match a single char a-z
(\s+\d{4})\b Capture group 2, match 1+ whitspace chars and 4 digits followed by a word boundary
.*?\bArticle\b Match as least as possible chars until Article
Regex demo
The replaced value will be
Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin
You could use positive lookbehinds:
(?<=^[A-Z][a-z]{2})[a-z]*|(?<=\d{4}).*Article
(?<=^[A-Z][a-z]{2}) - behind me is the start of a line and 3 chars; presumably the first three chars of the month
[a-z]* - optionally, capture the rest of the month
| - or
(?<=\d{4}) - behind me is 4 digits; presumably a year
.*Article - capture everything leading up to and including "Article"
https://regex101.com/r/fbYdpH/1

regular expression with If condition question

I have the following regular expressions that extract everything after first two alphabets
^[A-Za-z]{2})(\w+)($) $2
now I want to the extract nothing if the data doesn't start with alphabets.
Example:
AA123 -> 123
123 -> ""
Can this be accomplished by regex?
Introduce an alternative to match any one or more chars from start to end of string if your regex does not match:
^(?:([A-Za-z]{2})(\w+)|.+)$
See the regex demo. Details:
^ - start of string
(?: - start of a container non-capturing group:
([A-Za-z]{2})(\w+) - Group 1: two ASCII letters, Group 2: one or more word chars
| - or
.+ - one or more chars other than line break chars, as many as possible (use [\w\W]+ to match any chars including line break chars)
) - end of a container non-capturing group
$ - end of string.
Your pattern already captures 1 or more word characters after matching 2 uppercase chars. The $ does not have to be in a group, and this $2 should not be in the pattern.
^[A-Za-z]{2})(\w+)$
See a regex demo.
Another option could be a pattern with a conditional, capturing data in group 2 only if group 1 exist.
^([A-Z]{2})?(?(1)(\w+)|.+)$
^ Start of string
([A-Z]{2})? Capture 2 uppercase chars in optional group 1
(? Conditional
(1)(\w+) If we have group 1, capture 1+ word chars in group 2
| Or
.+ Match the whole line with at least 1 char to not match an empty string
) Close conditional
$ End of string
Regex demo
For a match only, you could use other variations Using \K like ^[A-Za-z]{2}\K\w+$ or with a lookbehind assertion (?<=^[A-Za-z]{2})\w+$

SAS: Remove duplicated expressions from given list using REGEX

I would like to remove duplicated expressions from a given string using SAS code. Each expression is delimited by a space and respects the following REGEX /[A-Z]_\d{2}.\d{2}(.[a-z])?/.
Here is the code:
data want;
text = "X_99.99.a X_99.99.a A_12.00 A_12.00 A_13.00 A_12.00 X_99.99.a";
do i=1 to countw(text);
Nondups=prxchange('s/\b(\w+)\s\1/$1/',-1,compbl(text));
end;
run;
The desired result should be:
Nondups ="X_99.99.a A_12.00 A_13.00"
What should be the regular expression to be used inside the function prxchange?
Any help appreciated.
You may use
Nondups=trim(prxchange('s/\s*([A-Z]_\d{2}\.\d{2}(?:\.[a-z])?)(?=.*\1)//',-1, text));
See the regex demo
The pattern matches:
\s* - 0+ whitespaces
([A-Z]_\d{2}\.\d{2}(?:\.[a-z])?) - Group 1:
[A-Z] - an uppercase ASCII letter
_ - an underscore
\d{2} - two digits
\. - a dot (must be escaped)
\d{2} - two digits
(?:\.[a-z])? - an optional group matching 1 or 0 sequences of a . and a lowercase ASCII letter
(?=.*\1) - a positive lookahead that requires any 0+ chars other than line break chars, as many as possible, up to the value stored in Group 1 immediately to the right of the current location.

regex for comma separated dates in php

I have a problem to set regular expression for multiple dates with comma separator.
I have dates like as :
2017-03-25, 2017-03-27, 2017-03-28
please help me guys.....
i am trying to set php validation for selecting multi dates (jquery calender).
my regex is :
$value = "2017-03-25, 2017-03-27, 2017-03-28";
preg_match("/^[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1])([0-9])*$/",$value)
You are matching only a single date with 0+ digits after it with your regex.
You may use the following fix:
^([0-9]{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01]))(?:,\s*(?1))*$
See the regex demo
Details:
^ - start of string
([0-9]{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])) - Group 1 matching and capturing a date-like substring
(?:,\s*(?1))* - zero or more sequences of:
, - comma
\s* - 0+ whitespaces (remove * to only match one, or use ? to match 1 or 0 whitespaces)
(?1) - recurse Group 1 subpattern
$ - end of string.

Why does (?:\s)\w{2}(?:\s) not match only a 2 letter sub string with spaces around it not the spaces as well?

I am trying to make a regex that matches all sub strings under or equal to 2 characters with a space on ether side. What did I do wrong?
ex what I want to do is have it match %2 or q but not 123 and not the spaces.
update this \b\w{2}\b if it also matched one letter sub strings and did not ignore special characters like - or #.
You should use
(^|\s)\S{1,2}(?=\s)
Since you cannot use a look-behind, you can use a capture group and if you replace text, you can then restore the captured part with $1.
See regex demo here
Regex breakdown:
(^|\s) - Group 1 - either a start of string or a whitespace
\S{1,2} - 1 or 2 non-whitespace characters
(?=\s) - check if after 1 or 2 non-whitespace characters we have a whitespace. If not, fail.