Reg Exp don't select if more than one group matches (multiple XOR) - regex

The data are held in an Oracle 12c database, one row per ICD-10-CM code, with a patient ID (foreign key) like so (note that there could be many other codes, the following are just the ones pertinent to this question):
ID ICD10CODE
1 S72.91XB
1 S72.92XB
2 S72.211A
3 S72.414A
3 S72.415A
4 S32.509A
5 S32.301A
5 S32.821A
6 S32.421A
6 S32.422A
7 S32.421A
8 S32.421A
8 S32.509A
The task at hand is to select distinct patients that match only one of the following points (using standard regular expression syntax):
Any number of: S32\.1\w\w\w, S32\.2\w\w\w, S32\.3\w\w\w, S32\.5\w\w\w, S32\.6\w\w\w, S32\.7\w\w\w, S32\.8\w\w\w
Any number of: S32\.4\w1\w, S32\.4\w3\w, S32\.4\w4\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S32\.4\w2\w, S32\.4\w3\w, S32\.4\w5\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S72\.[0-8]\w1\w, S72\.[0-8]\w3\w, S72\.[0-8]\w4\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.[0-8]\w2\w, S72\.[0-8]\w3\w, S72\.[0-8]\w5\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.91\w\w, S72\.93\w\w, S72\.94\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any number of: S72\.92\w\w, S72\.93\w\w, S72\.95\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any permutation or combination (including repetitions) of codes listed within a bullet are permitted for each patient, but permutations or combinations across rows should occur mutually exclusively for a patient. My method is to apply LISTAGG on GROUP BY ID:
ID LISTAGG(ICD10CODE, ',')
1 S72.91XB,S72.92XB
2 S72.211A
3 S72.414A,S72.415A
4 S32.509A
5 S32.301A,S32.821A
6 S32.421A,S32.422A
7 S32.421A
8 S32.421A,S32.509A
Then filter using this regular expression, (S32\.(([1-3]|[5-8])|(4\w((1|4)|(2|5)|(3)|([5-9]))))\w+)|(S72\.(([0-8]\w((1|4)|(2|5)|(3)|([5-9])))|(9((1|4)|(2|5)|(3)|([5-9]))))\w+), which is almost a literal representation of the bullets above. My expression is adapted from the idea in this answer, where it seems that, ((RB\s+)+|(JJ\s+)+) automatically selects either "RB" or "JJ", but not both.
I cannot get it to work. The answer should contain only IDs 2, 4, 5, and 7. But, the expression I developed matches all IDs.
What is a solution to this problem?
[Edit] Some more information:
All these S codes above relate to injuries to the bones in the lower extremity: S32 is for fractures of the pelvis (hip bone), S72 is for fractures of the femur (thigh bone). Note that we have two femurs, and two acetabulum (socket of the pelvis where the femur connects). The S32.4 code denotes the acetabulum (the rest of the S32.[1235678]\w{3} series denotes other parts of the pelvis). Right and left femur and acetabulum are denoted by 1|4 or 2|5 in the 6th character, respectively, unless the code starts with S72.9 when those numbers appear in the 5th character.
The patients to be included in the study population should only have one of the bones broken. That means, one of the two femurs, one of the acetabulum, or the pelvis, but not a combination of them. Combinations of fractures of a single bone do not matter. For example, the right single femur can be broken in 10 different places and ways (the knee area, the middle shaft, the head, etc., each generating a different S72.\w[1|4]\w{2} code), and should still be selected.

Option 1:
You can do it with a single regular expression:
SELECT t.id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
WHERE REGEXP_LIKE(
t.icd10codes,
'^(S32\.[1235678]\w\w\w(,|$))+$'
|| '|^(S32\.4\w[1346789]\w(,|$))+$'
|| '|^(S32\.4\w[2356789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[1346789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[2356789]\w(,|$))+$'
|| '|^(S72\.9[1346789]\w\w(,|$))+$'
|| '|^(S72\.9[2356789]\w\w(,|$))+$'
)
Which, for your sample data:
CREATE TABLE table_name (ID, ICD10CODE) AS
SELECT 1, 'S72.91XB' FROM DUAL UNION ALL
SELECT 1, 'S72.92XB' FROM DUAL UNION ALL
SELECT 2, 'S72.211A' FROM DUAL UNION ALL
SELECT 3, 'S72.414A' FROM DUAL UNION ALL
SELECT 3, 'S72.415A' FROM DUAL UNION ALL
SELECT 4, 'S32.509A' FROM DUAL UNION ALL
SELECT 5, 'S32.301A' FROM DUAL UNION ALL
SELECT 5, 'S32.821A' FROM DUAL UNION ALL
SELECT 6, 'S32.421A' FROM DUAL UNION ALL
SELECT 6, 'S32.422A' FROM DUAL UNION ALL
SELECT 7, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.509A' FROM DUAL;
Outputs:
ID
ICD10CODES
2
S72.211A
4
S32.509A
5
S32.301A,S32.821A
7
S32.421A
Option 2:
You can put the regular expressions into a table:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32\.[1235678]\w\w\w' FROM DUAL UNION ALL
SELECT 2, 'S32\.4\w[1346789]\w' FROM DUAL UNION ALL
SELECT 3, 'S32\.4\w[2356789]\w' FROM DUAL UNION ALL
SELECT 4, 'S72\.[0-8]\w[1346789]\w' FROM DUAL UNION ALL
SELECT 5, 'S72\.[0-8]\w[2356789]\w' FROM DUAL UNION ALL
SELECT 6, 'S72\.9[1346789]\w\w' FROM DUAL UNION ALL
SELECT 7, 'S72\.9[2356789]\w\w' FROM DUAL;
Then you can use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (REGEXP_LIKE(t.icd10code, '^' || m.match || '$'))
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
Option 3:
Similar to the first option, but you can put the matches into a table and you can determine which match has been used:
SELECT t.id,
m.id AS match_id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
INNER JOIN matches m
ON (REGEXP_LIKE(t.icd10codes, '^(' || m.match || '(,|$))+$' ))
Options 2 & 3 both output:
ID
MATCH_ID
ICD10CODES
4
1
S32.509A
5
1
S32.301A,S32.821A
7
2
S32.421A
2
4
S72.211A
Option 4:
You can also get rid of the (slow) regular expressions and use LIKE if you store the matches as:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32.1___' FROM DUAL UNION ALL
SELECT 1, 'S32.2___' FROM DUAL UNION ALL
SELECT 1, 'S32.3___' FROM DUAL UNION ALL
SELECT 1, 'S32.5___' FROM DUAL UNION ALL
SELECT 1, 'S32.6___' FROM DUAL UNION ALL
SELECT 1, 'S32.7___' FROM DUAL UNION ALL
SELECT 1, 'S32.8___' FROM DUAL UNION ALL
SELECT 2, 'S32.4_1_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_4_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_2_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_5_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_1_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_4_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_2_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_5_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 6, 'S72.91__' FROM DUAL UNION ALL
SELECT 6, 'S72.93__' FROM DUAL UNION ALL
SELECT 6, 'S72.94__' FROM DUAL UNION ALL
SELECT 6, 'S72.96__' FROM DUAL UNION ALL
SELECT 6, 'S72.97__' FROM DUAL UNION ALL
SELECT 6, 'S72.98__' FROM DUAL UNION ALL
SELECT 6, 'S72.99__' FROM DUAL UNION ALL
SELECT 7, 'S72.92__' FROM DUAL UNION ALL
SELECT 7, 'S72.93__' FROM DUAL UNION ALL
SELECT 7, 'S72.95__' FROM DUAL UNION ALL
SELECT 7, 'S72.96__' FROM DUAL UNION ALL
SELECT 7, 'S72.97__' FROM DUAL UNION ALL
SELECT 7, 'S72.98__' FROM DUAL UNION ALL
SELECT 7, 'S72.99__' FROM DUAL;
Then use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (t.icd10code LIKE m.match)
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
db<>fiddle here

Ok, I've added your broken bones codes to the S32 and S72 series.
That's all that needed to be done really.
Feel free to change ,? to (,|$) but don't change anything else.
Let me know if the broken bones codes is about right.
S32\.\w[25]\w\w, S32.1\w\w\w, S32.2\w\w\w, S32.3\w\w\w, S32.5\w\w\w, S32.6\w\w\w, S32.7\w\w\w, S32.8\w\w\w
S32\.\w[25]\w\w, S32.4\w1\w, S32.4\w3\w, S32.4\w4\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S32\.\w[25]\w\w, S32.4\w2\w, S32.4\w3\w, S32.4\w5\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w1\w, S72.[0-8]\w3\w, S72.[0-8]\w4\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w2\w, S72.[0-8]\w3\w, S72.[0-8]\w5\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.[14]\w\w\w, S72.91\w\w, S72.93\w\w, S72.94\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
S72\.[14]\w\w\w, S72.92\w\w, S72.93\w\w, S72.95\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
The new regex is
^((S32\.(\w[25]|[1-35-8]\w)\w\w,?)+|(S32\.(\w[25]\w|4\w[1346-9])\w,?)+|(S32\.(\w[25]\w|4\w[235-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[1346-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[235-9])\w,?)+|(S72\.([14]\w|9[1346-9])\w\w,?)+|(S72\.([14]\w|9[235-9])\w\w,?)+)$
https://regex101.com/r/OAHdCO/1
^
(
( S32 \. ( \w [25] | [1-35-8] \w ) \w\w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [1346-9] ) \w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [235-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [1346-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [235-9] ) \w ,? )+
| ( S72 \. ( [14] \w | 9 [1346-9] ) \w\w ,? )+
| ( S72 \. ( [14] \w | 9 [235-9] ) \w\w ,? )+
)
$

Related

Oracle REGEXP_SUBSTR will not match the dot character

I'm trying to extract information from strings like:
FOO-BAR-AUDIT-DATABASE.NUPKG
FOO.BAR.DATABASE-2.0.0.NUPKG
to info like:
'FOO.BAR.DATABASE' '2.0.0'
| |
module_name version
Currently I'm not able to parse correctly when the module_name part contains . chars. See table below.
The example below show how I extract the information.
The first group of the regexp is the one that do not work correctly '(.*?), the remaining groups handle the cases of varying version information.
select case module_name when expected then 'pass' else 'fail' end as test, y.* from(
select lower(regexp_substr(t.pck, g.regex, 1, 1, '', 1)) as module_name,
t.expected,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 3)) as major,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 5)) as minor,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 7)) as patch,
(t.pck) as package_name
from (select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
union select 'funk-database-1.nupkg', 'funk-database' from dual
union select 'funk-database-1.2.nupkg', 'funk-database' from dual
union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual) t
cross join (select '(.*?)(-(\d+)(\.(\d+))?(\.(\d+))?)?(\..*)' as regex from dual) g
)y;
The query above yields the following (Oracle 19c):
test
module_name
expected
major
minor
patch
package_name
pass
foo-bar-audit-database
foo-bar-audit-database
FOO-BAR-AUDIT-DATABASE.NUPKG
fail
foo
foo.bar.database
FOO.BAR.DATABASE-2.0.0.NUPKG
pass
funky_log_database
funky_log_database
1
0
0
FUNKY_LOG_DATABASE-1.0.0.NUPKG
pass
baz-database
baz-database
1
0
1
baz-database-1.0.1.nupkg
pass
funk-database
funk-database
1
2
funk-database-1.2.nupkg
pass
funk-database
funk-database
1
funk-database-1.nupkg
I've tried use ([[:alnum:]._-]*?) as the first group, but it yield the same result. Switching to a greedy match matches too much.
Any good suggestions out there?
You can match from the end to get the version then extract the sub-string before the version to get the module name:
select case module_name when expected then 'pass' else 'fail' end as test,
y.*
from (
select lower(
substr(
t.pck,
1,
REGEXP_INSTR(t.pck, g.regex) - 1
)
) as module_name,
t.expected,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 2)) as major,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 3)) as minor,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 4)) as patch,
t.pck as package_name
from (
select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
union select 'funk-database-1.nupkg', 'funk-database' from dual
union select 'funk-database-1.2.nupkg', 'funk-database' from dual
union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual
) t
cross join (
select '(-(\d+)\.?(\d+)?\.?(\d+)?)?\.[^.]+$' as regex from dual
) g
)y;
Outputs:
TEST
MODULE_NAME
EXPECTED
MAJOR
MINOR
PATCH
PACKAGE_NAME
pass
foo-bar-audit-database
foo-bar-audit-database
FOO-BAR-AUDIT-DATABASE.NUPKG
pass
foo.bar.database
foo.bar.database
2
0
0
FOO.BAR.DATABASE-2.0.0.NUPKG
pass
funky_log_database
funky_log_database
1
0
0
FUNKY_LOG_DATABASE-1.0.0.NUPKG
pass
baz-database
baz-database
1
0
1
baz-database-1.0.1.nupkg
pass
funk-database
funk-database
1
2
funk-database-1.2.nupkg
pass
funk-database
funk-database
1
funk-database-1.nupkg
db<>fiddle here
Would this do? It isn't sophisticated, but - returns data you wanted (at least, I think so).
lines #1 - 8 - sample data
temp CTE: removes extension (.nupkg), for simplicity
final query:
line #18 is module name; if it contains numbers, then get substring up to the first digit. Otherwise, remove the whole PCT value
lines #20 - 22 return version: if there are no digits, return NULL. Otherwise, return substring from the first digit onwards
SQL> with
2 test as
3 (select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
4 union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
5 union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
6 union select 'funk-database-1.nupkg', 'funk-database' from dual
7 union select 'funk-database-1.2.nupkg', 'funk-database' from dual
8 union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual),
9 temp as
10 -- remove extension
11 (select pck pck_old, expected,
12 replace(lower(pck), '.nupkg', '') pck
13 from test
14 )
15 select pck_old,
16 expected,
17 --
18 nvl(substr(pck, 1, regexp_instr(pck, '\d') - 2), pck) module_name,
19 --
20 case when regexp_instr(pck, '\d') = 0 then null
21 else substr(pck, regexp_instr(pck, '\d'))
22 end version
23 from temp;
PCK_OLD EXPECTED MODULE_NAME VERSION
------------------------------ ---------------------- ----------------------- --------
FOO-BAR-AUDIT-DATABASE.NUPKG foo-bar-audit-database foo-bar-audit-database
FOO.BAR.DATABASE-2.0.0.NUPKG foo.bar.database foo.bar.database 2.0.0
FUNKY_LOG_DATABASE-1.0.0.NUPKG funky_log_database funky_log_database 1.0.0
baz-database-1.0.1.nupkg baz-database baz-database 1.0.1
funk-database-1.2.nupkg funk-database funk-database 1.2
funk-database-1.nupkg funk-database funk-database 1
6 rows selected.
SQL>

Use connect by by in REGEXP_SUBSTR without breaking result to multiple rows

SELECT CHR(91)||'a-zA-Z0-9._%-'||CHR(93)||'+'|| listagg(REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com', '#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL), ', ') within group (order by level) as domain
FROM DUAL
CONNECT BY REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com','#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL) IS NOT NULL
order by 1;
Above script only has the regular expression in front of #yahoo.com
[a-zA-Z0-9._%-]+#yahoo.com, #hotmail.com
Expected result:
[a-zA-Z0-9._%-]+#yahoo.com, [a-zA-Z0-9._%-]+#hotmail.com
Sure; aggregate them back.
SQL> SELECT listagg(REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com', '#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL), ', ') within group (order by level) as domain
2 FROM DUAL
3 CONNECT BY REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com','#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL) IS NOT NULL
4 order by 1;
DOMAIN
----------------------------------------------------------------------------------------------------
#yahoo.com, #hotmail.com
SQL>
If you want to put the regexp prefix to all domains, then
SQL> SELECT LISTAGG ( '[a-zA-Z0-9._%-]+'
2 || REGEXP_SUBSTR ('aaa#yahoo.com, bbb#hotmail.com',
3 '#' || CHR (91) || '^,' || CHR (93) || '+',
4 1,
5 LEVEL),
6 ', ')
7 WITHIN GROUP (ORDER BY LEVEL) AS domain
8 FROM DUAL
9 CONNECT BY REGEXP_SUBSTR ('aaa#yahoo.com, bbb#hotmail.com',
10 '#' || CHR (91) || '^,' || CHR (93) || '+',
11 1,
12 LEVEL)
13 IS NOT NULL;
DOMAIN
-----------------------------------------------------------------------------------------------
[a-zA-Z0-9._%-]+#yahoo.com, [a-zA-Z0-9._%-]+#hotmail.com
SQL>

Extract Numbers from String - Custom

I'd like to extract "Most" numbers from a string and Add "JW" at the end.
My values look like:
RFID_DP_IDS339020JW3_IDMsg - Result = 339020JW
RFID_DP_IDSA72130JW_IDMsg --> 72130JW
RFID_DP_IDS337310JW1_IDMsg --> 337310JW
Basically I would remove all first letters, keep all numbers and JW
For now I had this
regexp_replace(Business_CONTEXT, '[^0-9]', '')||'JW' RegistrationPoint
But that would include the numbers AFTER 'JW'
Any idea?
How about this?
result would return exactly two letters after bunch of digits
result2 would return digits + JW
Pick the one you find the most appropriate.
SQL> with test (col) as
2 (select 'RFID_DP_IDS339020JW3_IDMsg' from dual union all
3 select 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
4 select 'RFID_DP_IDS337310JW1_IDMsg' from dual
5 )
6 select col,
7 regexp_substr(col, '\d+[[:alpha:]]{2}') result,
8 regexp_substr(col, '\d+JW') result2
9 from test;
COL RESULT RESULT2
-------------------------- -------------------------- --------------------------
RFID_DP_IDS339020JW3_IDMsg 339020JW 339020JW
RFID_DP_IDSA72130JW_IDMsg 72130JW 72130JW
RFID_DP_IDS337310JW1_IDMsg 337310JW 337310JW
SQL>
If you really want to extract the longest digit string out of your given strings you can use the following:
WITH test (Business_CONTEXT) AS
(SELECT 'RFID_DP_IDS339020JW3_I9DMsg' from dual union all
SELECT 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
SELECT 'RFID_DP_IDS337310JW1_IDMsg' from dual
)
SELECT Business_CONTEXT
, (SELECT MAX(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL))
KEEP (dense_rank last ORDER BY LENGTH(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL)))
FROM dual
CONNECT BY regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL) IS NOT NULL) num
FROM test
Result:
Business_CONTEXT | NUM
----------------------------+-----
RFID_DP_IDS339020JW3_I9DMsg | 339020
RFID_DP_IDSA72130JW_IDMsg | 72130
RFID_DP_IDS337310JW1_IDMsg | 337310

PL/SQL split one to many rows

I have a table like this.
|PARAMKEY | PARAMVALUE
----------+------------
KEY |[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]]
I need to split the values into three columns and I use REGEXP_SUBSTR. Here is my code.
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1,1 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 2) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 3) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 4 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 5) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 6) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 7 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 8) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 9) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY';
and this is the result that i need.
PARAMETER | VERSION | SCHEMA
---------+---------+-------
PAR_A |2 |SCH_A
PAR_B |4 |SCH_B
PAR_C |3 |SCH_C
But the value is too long and I hope there is another way to make it simplier by using loop or anything.
Thanks
Try something like this:
with tmp_param_table as
(
select 'KEY' as PARAMKEY , '[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]],["PAR_D",4,"SCH_D"]]' as PARAMVALUE from dual
),
levels as (select level as lv from dual connect by level <= 156),
steps as (select lv-2 as step from levels where MOD(lv,3)=0)
select step, (SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') parameter,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+1 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') version,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+2 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') schema
from steps
Here
levels - returns numbers form 1 till 156 (52*3) (or whatever you need)
steps - are the numbers 1, 4, 7 etc with step 3
Results:
1 PAR_A 2 SCH_A
4 PAR_B 4 SCH_B
7 PAR_C 3 SCH_C
10 PAR_D 4 SCH_D
13
etc..
I have tried using regular expression
and part paramvalue column value into common separated value
SELECT
REGEXP_SUBSTR(COL, '[^],["]+', 1, 1) PARAMETER,
REGEXP_SUBSTR(COL, '[^],[",]+', 1, 2) VERSION,
REGEXP_SUBSTR(COL, '[^],["]+', 1, 3) SCHEMA
FROM
(
SELECT paramkey,REGEXP_SUBSTR(to_char(paramvalue),'[^][^]+',1,level ) COL
from tmp_param_table
connect by regexp_substr(to_char(paramvalue),'[^][^]+',1, level) is not null
)
WHERE COL <>','
I hope this may help.

using Oracle REGEXP_INSTR to find exact word

I want to return the following position from the strings using REGEXP_INSTR.
I am looking for the word car with exact match in the following strings.
car,care,oscar - 1
care,car,oscar - 6
oscar,care,car - 12
something like
SELECT REGEXP_INSTR('car,care,oscar', 'car', 1, 1) "REGEXP_INSTR" FROM DUAL;
I am not sure what kind of escape operators to use.
A simpler solution is to surround the source string and search string with commas and find the position using INSTR.
SELECT INSTR(',' || 'car,care,oscar' || ',', ',car,') "INSTR" FROM DUAL;
Example:
SQL Fiddle
with x(y) as (
SELECT 'car,care,oscar' from dual union all
SELECT 'care,car,oscar' from dual union all
SELECT 'oscar,care,car' from dual union all
SELECT 'car' from dual union all
SELECT 'cart,care,oscar' from dual
)
select y, ',' || y || ',' , instr(',' || y || ',',',car,')
from x
| Y | ','||Y||',' | INSTR(','||Y||',',',CAR,') |
|-----------------|-------------------|----------------------------|
| car,care,oscar | ,car,care,oscar, | 1 |
| care,car,oscar | ,care,car,oscar, | 6 |
| oscar,care,car | ,oscar,care,car, | 12 |
| car | ,car, | 1 |
| cart,care,oscar | ,cart,care,oscar, | 0 |
The following query handles all scenarios. It returns the starting position if the string begins with car, or the whole string is just car. It returns the starting position + 1 if ,car, is found or if the string ends with ,car to account for the comma.
SELECT
CASE
WHEN REGEXP_LIKE('car,care,oscar', '^car,|^car$') THEN REGEXP_INSTR('car,care,oscar', '^car,|^car$', 1, 1)
WHEN REGEXP_LIKE('car,care,oscar', ',car,|,car$') THEN REGEXP_INSTR('car,care,oscar', ',car,|,car$', 1, 1)+1
ELSE 0
END "REGEXP_INSTR"
FROM DUAL;
SQL Fiddle demo with the various possibilities
I like Noel his answer as it gives a very good performance! Another way around is by creating separate rows from a character separated string:
pm.nodes = 'a;b;c;d;e;f;g'
(select regexp_substr(pm.nodes,'[^;]+', 1, level)
from dual
connect by regexp_substr(pm.nodes, '[^;]+', 1, level) is not null)