I have the following set of data where I need to replace the number 41 with another number.
column1
41,46
31,41,48,55,58,121,122
31,60,41
41
We can see four conditions here
41,
41
,41,
41,
I have written the following query
REGEXP_replace(column1,'^41$|^41,|,41,|,41$','xx')
where xx is the number to be replaced.
This query will replace the comma as well which is not expected.
Example : 41,46 is replaced as xx46. Here the expected output is xx,46. Please note that there are no spaced between the comma and numbers.
Can somebody help out how to use the regex?
Assuming the string is comma separated, You can use comma concatenation with replace and trim to do the replacement. No regex needed. You should avoid regex as the solution is likely to be slow.
with t (column1) as (
select '41,46' from dual union all
select '31,41,48,55,58,121,122' from dual union all
select '31,60,41' from dual union all
select '41' from dual
)
-- Test data setup. Actual solution is below: --
select
column1,
trim(',' from replace(','||column1||',', ',41,', ',17,')) as replaced
from t;
Output:
COLUMN1 REPLACED
41,46 17,46
31,41,48,55,58,121,122 31,17,48,55,58,121,122
31,60,41 31,60,17
41 17
4 rows selected.
Also, it's worth noting here that the comma separated strings is not the right way of storing data. Normalization is your friend.
In simple term, what I am looking for is this If there is a string, which has a keyword ZTFN00, then the regex shall be able to return the closest 9 to 11 digit number to the left or right side of the string.
I want to do this in REGEXP_REPLACE function of oracle.
Below are some of the sample strings:
The following error occurred in the SAP UPDATE_BP service as part of the combine:
(error:653, R11:186:Number 867278489 Already Exists for ID Type ZTFN00)
Expected result: 867278489
The following error occurred in the SAP UPDATE_BP service as part of the combine
(error:653, R11:186:Number ZTFN00 identification number 123456778 already exist)
Expected result: 123456778
I could not find a way to easily do this with regular expressions, but if you want to do the task without PL/SQL, you can do something like the following.
It's a little bit tricky, combining many calls to regexp functions to evaluate, for each occurrence of digit string, the distance from your keyword and then pick the nearest one.
with test(string, keyWord) as
( select
'(error:653, R11:186: 999999999 Number 0000000000 Already Exists for ID Type ZTFN00 hjhk 11111111111 kjh k222222222)',
'ZTFN00'
from dual)
select numberString
from (
select numberString,
decode (greatest (numberPosition, keyWordPosition),
keyWordPosition,
keyWordPosition - numberPosition - numberLength,
numberPosition,
numberPosition - keyWordPosition - keyWordLength
) as distance
from (
select regexp_instr(string, '[0-9]{9,11}', 1, level) as numberPosition,
instr( string, keyWord) as keyWordPosition,
length(regexp_substr(string, '[0-9]{9,11}', 1, level)) as numberLength,
regexp_substr(string, '[0-9]{9,11}', 1, level) as numberString,
length(keyWord) as keyWordLength
from test
connect by regexp_instr(string, '[0-9]{9,11}', 1, level) != 0
)
order by distance
)where rownum = 1
Looking at the single parts:
SQL> with test(string, keyWord) as
2 ( select
3 '(error:653, R11:186: 999999999 Number 0000000000 Already Exists for ID Type ZTFN00 hjhk 11111111111 kjh k222222222)',
4 'ZTFN00'
5 from dual)
6 select regexp_instr(string, '[0-9]{9,11}', 1, level) as numberPosition,
7 instr( string, keyWord) as keyWordPosition,
8 length(regexp_substr(string, '[0-9]{9,11}', 1, level)) as numberLength,
9 regexp_substr(string, '[0-9]{9,11}', 1, level) as numberString,
10 length(keyWord) as keyWordLength
11 from test
12 connect by regexp_instr(string, '[0-9]{9,11}', 1, level) != 0;
NUMBERPOSITION KEYWORDPOSITION NUMBERLENGTH NUMBERSTRING KEYWORDLENGTH
-------------- --------------- ------------ ---------------- -------------
22 77 9 999999999 6
39 77 10 0000000000 6
91 77 11 11111111111 6
108 77 9 222222222 6
This scans all the string, and iterates while insrt (...) != 0, that is while there are occurrences; the level is used to look for the first, second, ... occurrence, so that row 1 gives the first occurrence, row two the second and so on, while exists the nth occurrence.
This part is only used to evaluate some useful fields, tha we use to look both to the right and to the left of you keyword, exactly evaluating the distance between the string number and the keyword:
select numberString,
decode (greatest (numberPosition, keyWordPosition),
keyWordPosition,
keyWordPosition - numberPosition - numberLength,
numberPosition,
numberPosition - keyWordPosition - keyWordLength
) as distance
The inner query is ordered by distance, so that the first row contains the nearest string; that's why in the outermost query we only extract the row with
rownum = 1 to get the nearest row.
It can be re-written in a more compact way, but this is a bit more readable.
This should even work when you have multiple occurrences of the digit string, even on both sides of your keyword.
This regex works for me in RegexBuddy with Oracle mode selected (10g, 11g and 12c):
SELECT REGEXP_SUBSTR(mycolumn,
'\(error:[0-9]+,[ ]+
(
(
([0-9]{9,11})()
|
ZTFN00()
|
[^ ),]+
)
[ ),]+
)+
\4\5',
1, 1, 'cx', 3) FROM mytable;
The regex treats the main body of the string as a series of tokens matching the general pattern [^ ),]+ (one or more of any characters except space, right parenthesis, or comma). But there are two specific tokens that it tries to match first: the keyword (ZTFN00) and a valid ID number ([0-9]{9,11}).
The empty groups at the end of the first two alternatives serve as check boxes; the corresponding backreferences at the end (\4 and \5) will only succeed if those groups participated in the match, meaning both an ID number and the keyword were seen.
(This is an obscure "feature" that definitely doesn't work in many flavors, so I can't be positive it will work in Oracle. Please let me know if it doesn't.)
The ID number is captured in group #3, and that's what the REGEXP_SUBSTR command returns. (Since you only want to retrieve the number, there no call for REGEXP_REPLACE.)
IMO, this query should return A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*?,') as A_and_B FROM dual
But it returns the whole string, A=1,B=2,C=3,, instead. Why?
Update 1:
Oracle 10.2+ is required to use Perl-style metacharacters in regular expressions.
Update 2:
A more clear form of my question (to avoid questions about Oracle version and availability of Perl-style regex extension):
On the same system, why does a non-greedy quantifier sometimes work as expected and sometimes not?
This works correctly:
regexp_substr('A=1,B=2,C=3,', 'B=.*?,')
This doesn't work:
regexp_substr('A=1,B=2,C=3,', '.*B=.*?,')
Fiddle
Update 3:
Yes, it seems to be a bug.
What is the Oracle support reaction on this issue?
Is the bug already known? Does it have an ID?
It's a BUG!
You are right that in Perl, 'A=1,B=2,C=3,' =~ /.*B=.*?,/; print $& prints A=1,B=2,
What you have stumbled upon is a bug that still exists in Oracle Database 11g R2. If the exact same regular expression atom (including the quantifier but excluding the greediness modifier) appears twice in a regular expression, both occurrences will have the greediness indicated by the first appearance regardless of the greediness specified by the second one. That this is a bug is clearly demonstrated by these results (here, "the exact same regular expression atom" is [^B]*):
SQL> SELECT regexp_substr('A=1,B=2,C=3,', '[^B]*B=[^Bx]*?,') as good FROM dual;
GOOD
--------
A=1,B=2,
SQL> SELECT regexp_substr('A=1,B=2,C=3,', '[^B]*B=[^B]*?,') as bad FROM dual;
BAD
-----------
A=1,B=2,C=3,
The only difference between the two regular expressions is that the "good" one excludes 'x' as a possible match in the second matching list. Since 'x' does not appear in the target string, excluding it should make no difference, but as you can see, removing the 'x' makes a big difference. That has to be a bug.
Here are some more examples from Oracle 11.2: (SQL Fiddle with even more examples)
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*?,') FROM dual; => A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*,') FROM dual; => A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.*?B=.*?,') FROM dual; => A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*?B=.*,') FROM dual; => A=1,B=2,
-- Changing second operator from * to +
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.+?,') FROM dual; => A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.+,') FROM dual; => A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.+B=.+,') FROM dual; => A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.+?B=.+,') FROM dual; => A=1,B=2,
The pattern is consistent: the greediness of the first occurrence is used for the second occurrence whether it should be or not.
Looking at the feedback, I hesitate to jump in, but here I go ;-)
According to the Oracle docs, the *? and +? match a "preceding subexpression". For *? specifically:
Matches zero or more occurrences of the preceding subexpression
(nongreedyFootref 1). Matches the empty string whenever possible.
To create a subexpression group, use parenthesis ():
Treats the expression within the parentheses as a unit. The expression
can be a string or a complex expression containing operators.
You can refer to a subexpression in a back reference.
This will allow you to use greedy and non-greedy (many alternating times actually) in the same regexp, with expected results. For your example:
select regexp_substr('A=1,B=2,C=3,', '(.)*B=(.)*?,') from dual;
To make the point a bit more clear (i hope), this example uses greedy and non-greedy in the same regexp_substr, with different (correct) results depending on where the ? is placed (it does NOT just use the rule for the first subexpression it sees). Also note that the subexpression (\w) will match alphanumerics and underscore only, not #.
-- non-greedy followed by greedy
select regexp_substr('1_#_2_a_3_#_4_a', '(\w)*?#(\w)*') from dual;
result: 1_#_2_a_3_
-- greedy followed by non-greedy
select regexp_substr('1_#_2_a_3_#_4_a', '(\w)*#(\w)*?') from dual;
result: 1_#
You've got a really great bounty, so I'm going to try to nail it comprehensively.
You make assumptions in your regular expression handling that are incorrect.
Oracle is NOT compatible with Perl regular expressions, it is
compatible with POSIX. It describes its support for Perl as
"Perl-Influenced"
There is an intrinsic syntax conflict around the use of the Perl "*?" in Oracle, if you
read that reference the way I do, and Oracle legitimately chooses the POSIX usage
Your description of how perl handles "*?" is not quite right.
Here is a mashup of the options we've discussed. The key to this issue is around case 30
CASE SRC TEXT RE FROM_WHOM RESULT
------- ------------------------------- ------------------ ----------------- -------------------------------------------------- --------------
1 Egor's original source string A=1,B=2,C=3, .*B=.*?, Egor's original pattern "doesn't work" A=1,B=2,C=3,
2 Egor's original source string A=1,B=2,C=3, .*B=.?, Egor's "works correctly" A=1,B=2,
3 Egor's original source string A=1,B=2,C=3, .*B=.+?, Old Pro comment 1 form 2 A=1,B=2,
4 Egor's original source string A=1,B=2,C=3, .+B=.*?, Old Pro comment 1 form 1 A=1,B=2,
5 Egor's original source string A=1,B=2,C=3, .*B=.{0,}?, Old Pro comment 2 A=1,B=2,
6 Egor's original source string A=1,B=2,C=3, [^B]*B=[^Bx]*?, Old Pro answer form 1 "good" A=1,B=2,
7 Egor's original source string A=1,B=2,C=3, [^B]*B=[^B]*?, Old Pro answer form 2 "bad" A=1,B=2,C=3,
8 Egor's original source string A=1,B=2,C=3, (.)*B=(.)*?, TBone answer form 1 A=1,B=2,
9 TBone answer example 2 1_#_2_a_3_#_4_a (\w)*?#(\w)* TBone answer example 2 form 1 1_#_2_a_3_
10 TBone answer example 2 1_#_2_a_3_#_4_a (\w)*#(\w)*? TBone answer example 2 form 2 1_#
30 Egor's original source string A=1,B=2,C=3, .*B=(.)*?, Schemaczar Variant to force Perl operation A=1,B=2,
31 Egor's original source string A=1,B=2,C=3, .*B=(.*)?, Schemaczar Variant of Egor to force POSIX A=1,B=2,C=3,
32 Egor's original source string A=1,B=2,C=3, .*B=.*{0,1} Schemaczar Applying Egor's 'non-greedy' A=1,B=2,C=3,
33 Egor's original source string A=1,B=2,C=3, .*B=(.)*{0,1} Schemaczar Another variant of Egor's "non-greedy" A=1,B=2,C=3,
I am pretty sure that CASE 30 is what you thought you were writing - that is, you thought the "*?" had a stronger association than the "*" by itself. True for Perl, I guess, but for Oracle (and presumably canonical POSIX) RE's, the "*?" has a lower precedence and associativity than "*". So Oracle reads it as "(.*)?" (case 31) whereas Perl reads it as "(.)*?", that is, case 30.
Note cases 32 and 33 indicate that "*{0,1}" does not work like "*?".
Note that Oracle REGEXP does not work like LIKE, that is, it does not require the match pattern to cover the entire test string. Using the "^" begin and "$" end markers might help you with this as well.
My script:
SET SERVEROUTPUT ON
<<DISCREET_DROP>> begin
DBMS_OUTPUT.ENABLE;
for dropit in (select 'DROP TABLE ' || TABLE_NAME || ' CASCADE CONSTRAINTS' AS SYNT
FROM TABS WHERE TABLE_NAME IN ('TEST_PATS', 'TEST_STRINGS')
)
LOOP
DBMS_OUTPUT.PUT_LINE('Dropping via ' || dropit.synt);
execute immediate dropit.synt;
END LOOP;
END DISCREET_DROP;
/
--------------------------------------------------------
-- DDL for Table TEST_PATS
--------------------------------------------------------
CREATE TABLE TEST_PATS
( RE VARCHAR2(2000),
FROM_WHOM VARCHAR2(50),
PAT_GROUP VARCHAR2(50),
PAT_ORDER NUMBER(9,0)
) ;
/
--------------------------------------------------------
-- DDL for Table TEST_STRINGS
--------------------------------------------------------
CREATE TABLE TEST_STRINGS
( TEXT VARCHAR2(2000),
SRC VARCHAR2(200),
TEXT_GROUP VARCHAR2(50),
TEXT_ORDER NUMBER(9,0)
) ;
/
--------------------------------------------------------
-- DDL for View REGEXP_TESTER_V
--------------------------------------------------------
CREATE OR REPLACE FORCE VIEW REGEXP_TESTER_V (CASE_NUMBER, SRC, TEXT, RE, FROM_WHOM, RESULT) AS
select pat_order as case_number,
src, text, re, from_whom,
regexp_substr (text, re) as result
from test_pats full outer join test_strings on (text_group = pat_group)
order by pat_order, text_order;
/
REM INSERTING into TEST_PATS
SET DEFINE OFF;
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.*?,','Egor''s original pattern "doesn''t work"','Egor',1);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.?,','Egor''s "works correctly"','Egor',2);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=(.)*?,','Schemaczar Variant to force Perl operation','Egor',30);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=(.*)?,','Schemaczar Variant of Egor to force POSIX','Egor',31);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.*{0,1}','Schemaczar Applying Egor''s ''non-greedy''','Egor',32);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=(.)*{0,1}','Schemaczar Another variant of Egor''s "non-greedy"','Egor',33);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('[^B]*B=[^Bx]*?,','Old Pro answer form 1 "good"','Egor',6);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('[^B]*B=[^B]*?,','Old Pro answer form 2 "bad"','Egor',7);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.+?,','Old Pro comment 1 form 2','Egor',3);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.*B=.{0,}?,','Old Pro comment 2','Egor',5);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('.+B=.*?,','Old Pro comment 1 form 1','Egor',4);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('(.)*B=(.)*?,','TBone answer form 1','Egor',8);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('(\w)*?#(\w)*','TBone answer example 2 form 1','TBone',9);
Insert into TEST_PATS (RE,FROM_WHOM,PAT_GROUP,PAT_ORDER) values ('(\w)*#(\w)*?','TBone answer example 2 form 2','TBone',10);
REM INSERTING into TEST_STRINGS
SET DEFINE OFF;
Insert into TEST_STRINGS (TEXT,SRC,TEXT_GROUP,TEXT_ORDER) values ('A=1,B=2,C=3,','Egor''s original source string','Egor',1);
Insert into TEST_STRINGS (TEXT,SRC,TEXT_GROUP,TEXT_ORDER) values ('1_#_2_a_3_#_4_a','TBone answer example 2','TBone',2);
COLUMN SRC FORMAT A50 WORD_WRAP
COLUMN TEXT FORMAT A50 WORD_WRAP
COLUMN RE FORMAT A50 WORD_WRAP
COLUMN FROM_WHOM FORMAT A50 WORD_WRAP
COLUMN RESULT FORMAT A50 WORD_WRAP
SELECT * FROM REGEXP_TESTER_V;
Because you're selecting too much:
SELECT
regexp_substr(
'A=1,B=2,C=3,',
'.*?B=.*?,'
) as A_and_B, -- Now works as expected
regexp_substr(
'A=1,B=2,C=3,',
'B=.*?,'
) as B_only -- works just fine
FROM dual
SQL Fiddle: http://www.sqlfiddle.com/#!4/d41d8/11450