Oracle REGEX_SUBSTR Not Honoring null values

Oracle REGEX_SUBSTR Not Honoring null values - regex

I have an issue of regex_substr not honoring the null value.
select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1) AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2) AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3) AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4) AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6) AS pub_indctr
from dual;
If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.
Result:-
PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR
---------- ------- ------- ----------- ---------- ------------
2035197553 2 S 14-JUN-14 P
While it should be
PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR
---------- ------- ------- ----------- ---------- ------------
2035197553 2 S 14-JUN-14 P
Any suggestions ?

I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):
SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
2 from dual;
P
-
You need to do this instead I believe (works on 11g):
SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
2 from dual;
P
-
P
I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls

You can solve your task like this:
with t(val) as (
select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
, substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
, substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
, substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
, substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
, substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
from t1
A B C D E F
-------------------------------------
2035197553 2 S 14-JUN-14 - P

The typical csv parsing approach is as follows:
WITH t(csv_str) AS
( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
UNION ALL
SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
)
SELECT LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t
I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.
Explanation of the search pattern
The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:
-The pattern begins with a ','
-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.
-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.
What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.

You need to change this line,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt,
to,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5) AS phn_end_dt,
^
[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.

Thanks for pointing me in the right direction, I have used this to solve the issue.
SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt ,
REGEXP_SUBSTR (val
|| ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr
FROM
(SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual
);
Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.
function substring_specific_occurence(p_string varchar2
,p_delimiter varchar2
,p_occurence number) return varchar2
is
l_output varchar2(2000);
g_miss_char varchar2(20) := 'fdkjkjhkuhhf7';
l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );
begin
while (l_string like '%'||p_delimiter||p_delimiter||'%' )
loop
l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
end loop;
select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence)
into l_output
from dual;
return replace(l_output,g_miss_char);
end substring_specific_occurence;

Related

Separating Text by Delimiter using regexp_subtr

I am using plsql to separate parts of my text.
The text:
'a^b^c^d^e'
declare
test varchar2(10);
begin
select 'a^b^c^d^e' into test from dual;
dbms_output.put_line('1. '|| regexp_substr(test, '[^^]+', 1, 1));
dbms_output.put_line('2. '|| regexp_substr(test, '[^^]+', 1, 2));
dbms_output.put_line('3. '|| regexp_substr(test, '[^^]+', 1, 3));
dbms_output.put_line('4. '|| regexp_substr(test, '[^^]+', 1, 4));
dbms_output.put_line('5. '|| regexp_substr(test, '[^^]+', 1, 5));
end;
Output:
1. a
2. b
3. c
4. d
5. e
This works as expected, until a null is found in the middle (i.e. 'a^b^^d^e').
I expect that output to be:
1. a
2. b
3.
4. d
5. e
but the actual output is:
1. a
2. b
3. d
4. e
5.
I'm not real good at regex but I am most of the way there.
Any help would be appreciated.

See my answer here for more detail. Don't use the format '[^^]+' for parsing strings! It returns unexpected results when there is a NULL element in the list and will get you in big trouble as it will return the wrong element. Instead use this form of REGEXP_SUBSTR() as it handles NULL list elements:
REGEXP_SUBSTR('a^b^^d^e', '(.*?)(\^|$)', 1, 4, NULL, 1)
Run it and you'll see you will get 'd' returned as expected.

Reg Exp don't select if more than one group matches (multiple XOR)

The data are held in an Oracle 12c database, one row per ICD-10-CM code, with a patient ID (foreign key) like so (note that there could be many other codes, the following are just the ones pertinent to this question):
ID ICD10CODE
1 S72.91XB
1 S72.92XB
2 S72.211A
3 S72.414A
3 S72.415A
4 S32.509A
5 S32.301A
5 S32.821A
6 S32.421A
6 S32.422A
7 S32.421A
8 S32.421A
8 S32.509A
The task at hand is to select distinct patients that match only one of the following points (using standard regular expression syntax):
Any number of: S32\.1\w\w\w, S32\.2\w\w\w, S32\.3\w\w\w, S32\.5\w\w\w, S32\.6\w\w\w, S32\.7\w\w\w, S32\.8\w\w\w
Any number of: S32\.4\w1\w, S32\.4\w3\w, S32\.4\w4\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S32\.4\w2\w, S32\.4\w3\w, S32\.4\w5\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S72\.[0-8]\w1\w, S72\.[0-8]\w3\w, S72\.[0-8]\w4\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.[0-8]\w2\w, S72\.[0-8]\w3\w, S72\.[0-8]\w5\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.91\w\w, S72\.93\w\w, S72\.94\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any number of: S72\.92\w\w, S72\.93\w\w, S72\.95\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any permutation or combination (including repetitions) of codes listed within a bullet are permitted for each patient, but permutations or combinations across rows should occur mutually exclusively for a patient. My method is to apply LISTAGG on GROUP BY ID:
ID LISTAGG(ICD10CODE, ',')
1 S72.91XB,S72.92XB
2 S72.211A
3 S72.414A,S72.415A
4 S32.509A
5 S32.301A,S32.821A
6 S32.421A,S32.422A
7 S32.421A
8 S32.421A,S32.509A
Then filter using this regular expression, (S32\.(([1-3]|[5-8])|(4\w((1|4)|(2|5)|(3)|([5-9]))))\w+)|(S72\.(([0-8]\w((1|4)|(2|5)|(3)|([5-9])))|(9((1|4)|(2|5)|(3)|([5-9]))))\w+), which is almost a literal representation of the bullets above. My expression is adapted from the idea in this answer, where it seems that, ((RB\s+)+|(JJ\s+)+) automatically selects either "RB" or "JJ", but not both.
I cannot get it to work. The answer should contain only IDs 2, 4, 5, and 7. But, the expression I developed matches all IDs.
What is a solution to this problem?
[Edit] Some more information:
All these S codes above relate to injuries to the bones in the lower extremity: S32 is for fractures of the pelvis (hip bone), S72 is for fractures of the femur (thigh bone). Note that we have two femurs, and two acetabulum (socket of the pelvis where the femur connects). The S32.4 code denotes the acetabulum (the rest of the S32.[1235678]\w{3} series denotes other parts of the pelvis). Right and left femur and acetabulum are denoted by 1|4 or 2|5 in the 6th character, respectively, unless the code starts with S72.9 when those numbers appear in the 5th character.
The patients to be included in the study population should only have one of the bones broken. That means, one of the two femurs, one of the acetabulum, or the pelvis, but not a combination of them. Combinations of fractures of a single bone do not matter. For example, the right single femur can be broken in 10 different places and ways (the knee area, the middle shaft, the head, etc., each generating a different S72.\w[1|4]\w{2} code), and should still be selected.

Option 1:
You can do it with a single regular expression:
SELECT t.id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
WHERE REGEXP_LIKE(
t.icd10codes,
'^(S32\.[1235678]\w\w\w(,|$))+$'
|| '|^(S32\.4\w[1346789]\w(,|$))+$'
|| '|^(S32\.4\w[2356789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[1346789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[2356789]\w(,|$))+$'
|| '|^(S72\.9[1346789]\w\w(,|$))+$'
|| '|^(S72\.9[2356789]\w\w(,|$))+$'
)
Which, for your sample data:
CREATE TABLE table_name (ID, ICD10CODE) AS
SELECT 1, 'S72.91XB' FROM DUAL UNION ALL
SELECT 1, 'S72.92XB' FROM DUAL UNION ALL
SELECT 2, 'S72.211A' FROM DUAL UNION ALL
SELECT 3, 'S72.414A' FROM DUAL UNION ALL
SELECT 3, 'S72.415A' FROM DUAL UNION ALL
SELECT 4, 'S32.509A' FROM DUAL UNION ALL
SELECT 5, 'S32.301A' FROM DUAL UNION ALL
SELECT 5, 'S32.821A' FROM DUAL UNION ALL
SELECT 6, 'S32.421A' FROM DUAL UNION ALL
SELECT 6, 'S32.422A' FROM DUAL UNION ALL
SELECT 7, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.509A' FROM DUAL;
Outputs:
ID
ICD10CODES
2
S72.211A
4
S32.509A
5
S32.301A,S32.821A
7
S32.421A
Option 2:
You can put the regular expressions into a table:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32\.[1235678]\w\w\w' FROM DUAL UNION ALL
SELECT 2, 'S32\.4\w[1346789]\w' FROM DUAL UNION ALL
SELECT 3, 'S32\.4\w[2356789]\w' FROM DUAL UNION ALL
SELECT 4, 'S72\.[0-8]\w[1346789]\w' FROM DUAL UNION ALL
SELECT 5, 'S72\.[0-8]\w[2356789]\w' FROM DUAL UNION ALL
SELECT 6, 'S72\.9[1346789]\w\w' FROM DUAL UNION ALL
SELECT 7, 'S72\.9[2356789]\w\w' FROM DUAL;
Then you can use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (REGEXP_LIKE(t.icd10code, '^' || m.match || '$'))
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
Option 3:
Similar to the first option, but you can put the matches into a table and you can determine which match has been used:
SELECT t.id,
m.id AS match_id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
INNER JOIN matches m
ON (REGEXP_LIKE(t.icd10codes, '^(' || m.match || '(,|$))+$' ))
Options 2 & 3 both output:
ID
MATCH_ID
ICD10CODES
4
1
S32.509A
5
1
S32.301A,S32.821A
7
2
S32.421A
2
4
S72.211A
Option 4:
You can also get rid of the (slow) regular expressions and use LIKE if you store the matches as:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32.1___' FROM DUAL UNION ALL
SELECT 1, 'S32.2___' FROM DUAL UNION ALL
SELECT 1, 'S32.3___' FROM DUAL UNION ALL
SELECT 1, 'S32.5___' FROM DUAL UNION ALL
SELECT 1, 'S32.6___' FROM DUAL UNION ALL
SELECT 1, 'S32.7___' FROM DUAL UNION ALL
SELECT 1, 'S32.8___' FROM DUAL UNION ALL
SELECT 2, 'S32.4_1_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_4_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_2_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_5_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_1_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_4_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_2_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_5_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 6, 'S72.91__' FROM DUAL UNION ALL
SELECT 6, 'S72.93__' FROM DUAL UNION ALL
SELECT 6, 'S72.94__' FROM DUAL UNION ALL
SELECT 6, 'S72.96__' FROM DUAL UNION ALL
SELECT 6, 'S72.97__' FROM DUAL UNION ALL
SELECT 6, 'S72.98__' FROM DUAL UNION ALL
SELECT 6, 'S72.99__' FROM DUAL UNION ALL
SELECT 7, 'S72.92__' FROM DUAL UNION ALL
SELECT 7, 'S72.93__' FROM DUAL UNION ALL
SELECT 7, 'S72.95__' FROM DUAL UNION ALL
SELECT 7, 'S72.96__' FROM DUAL UNION ALL
SELECT 7, 'S72.97__' FROM DUAL UNION ALL
SELECT 7, 'S72.98__' FROM DUAL UNION ALL
SELECT 7, 'S72.99__' FROM DUAL;
Then use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (t.icd10code LIKE m.match)
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
db<>fiddle here

Ok, I've added your broken bones codes to the S32 and S72 series.
That's all that needed to be done really.
Feel free to change ,? to (,|$) but don't change anything else.
Let me know if the broken bones codes is about right.
S32\.\w[25]\w\w, S32.1\w\w\w, S32.2\w\w\w, S32.3\w\w\w, S32.5\w\w\w, S32.6\w\w\w, S32.7\w\w\w, S32.8\w\w\w
S32\.\w[25]\w\w, S32.4\w1\w, S32.4\w3\w, S32.4\w4\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S32\.\w[25]\w\w, S32.4\w2\w, S32.4\w3\w, S32.4\w5\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w1\w, S72.[0-8]\w3\w, S72.[0-8]\w4\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w2\w, S72.[0-8]\w3\w, S72.[0-8]\w5\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.[14]\w\w\w, S72.91\w\w, S72.93\w\w, S72.94\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
S72\.[14]\w\w\w, S72.92\w\w, S72.93\w\w, S72.95\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
The new regex is
^((S32\.(\w[25]|[1-35-8]\w)\w\w,?)+|(S32\.(\w[25]\w|4\w[1346-9])\w,?)+|(S32\.(\w[25]\w|4\w[235-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[1346-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[235-9])\w,?)+|(S72\.([14]\w|9[1346-9])\w\w,?)+|(S72\.([14]\w|9[235-9])\w\w,?)+)$
https://regex101.com/r/OAHdCO/1
^
(
( S32 \. ( \w [25] | [1-35-8] \w ) \w\w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [1346-9] ) \w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [235-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [1346-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [235-9] ) \w ,? )+
| ( S72 \. ( [14] \w | 9 [1346-9] ) \w\w ,? )+
| ( S72 \. ( [14] \w | 9 [235-9] ) \w\w ,? )+
)
$

Use connect by by in REGEXP_SUBSTR without breaking result to multiple rows

SELECT CHR(91)||'a-zA-Z0-9._%-'||CHR(93)||'+'|| listagg(REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com', '#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL), ', ') within group (order by level) as domain
FROM DUAL
CONNECT BY REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com','#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL) IS NOT NULL
order by 1;
Above script only has the regular expression in front of #yahoo.com
[a-zA-Z0-9._%-]+#yahoo.com, #hotmail.com
Expected result:
[a-zA-Z0-9._%-]+#yahoo.com, [a-zA-Z0-9._%-]+#hotmail.com

Sure; aggregate them back.
SQL> SELECT listagg(REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com', '#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL), ', ') within group (order by level) as domain
2 FROM DUAL
3 CONNECT BY REGEXP_SUBSTR('aaa#yahoo.com, bbb#hotmail.com','#'||CHR(91)||'^,'||CHR(93)||'+', 1, LEVEL) IS NOT NULL
4 order by 1;
DOMAIN
----------------------------------------------------------------------------------------------------
#yahoo.com, #hotmail.com
SQL>
If you want to put the regexp prefix to all domains, then
SQL> SELECT LISTAGG ( '[a-zA-Z0-9._%-]+'
2 || REGEXP_SUBSTR ('aaa#yahoo.com, bbb#hotmail.com',
3 '#' || CHR (91) || '^,' || CHR (93) || '+',
4 1,
5 LEVEL),
6 ', ')
7 WITHIN GROUP (ORDER BY LEVEL) AS domain
8 FROM DUAL
9 CONNECT BY REGEXP_SUBSTR ('aaa#yahoo.com, bbb#hotmail.com',
10 '#' || CHR (91) || '^,' || CHR (93) || '+',
11 1,
12 LEVEL)
13 IS NOT NULL;
DOMAIN
-----------------------------------------------------------------------------------------------
[a-zA-Z0-9._%-]+#yahoo.com, [a-zA-Z0-9._%-]+#hotmail.com
SQL>

PL/SQL split one to many rows

I have a table like this.
|PARAMKEY | PARAMVALUE
----------+------------
KEY |[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]]
I need to split the values into three columns and I use REGEXP_SUBSTR. Here is my code.
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1,1 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 2) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 3) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 4 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 5) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 6) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 7 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 8) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 9) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY';
and this is the result that i need.
PARAMETER | VERSION | SCHEMA
---------+---------+-------
PAR_A |2 |SCH_A
PAR_B |4 |SCH_B
PAR_C |3 |SCH_C
But the value is too long and I hope there is another way to make it simplier by using loop or anything.
Thanks

Try something like this:
with tmp_param_table as
(
select 'KEY' as PARAMKEY , '[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]],["PAR_D",4,"SCH_D"]]' as PARAMVALUE from dual
),
levels as (select level as lv from dual connect by level <= 156),
steps as (select lv-2 as step from levels where MOD(lv,3)=0)
select step, (SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') parameter,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+1 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') version,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+2 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') schema
from steps
Here
levels - returns numbers form 1 till 156 (52*3) (or whatever you need)
steps - are the numbers 1, 4, 7 etc with step 3
Results:
1 PAR_A 2 SCH_A
4 PAR_B 4 SCH_B
7 PAR_C 3 SCH_C
10 PAR_D 4 SCH_D
13
etc..

I have tried using regular expression
and part paramvalue column value into common separated value
SELECT
REGEXP_SUBSTR(COL, '[^],["]+', 1, 1) PARAMETER,
REGEXP_SUBSTR(COL, '[^],[",]+', 1, 2) VERSION,
REGEXP_SUBSTR(COL, '[^],["]+', 1, 3) SCHEMA
FROM
(
SELECT paramkey,REGEXP_SUBSTR(to_char(paramvalue),'[^][^]+',1,level ) COL
from tmp_param_table
connect by regexp_substr(to_char(paramvalue),'[^][^]+',1, level) is not null
)
WHERE COL <>','
I hope this may help.

Oracle - how to convert string to row pair with out using WITH clause

In one of the column I have role and organization position
Example postion is 1 and organization is 310492 ...
1|310492|1|12319|1|562548|1|5202558
I need to convert this string to multiple rows
1,310492
1,12319
1,562548
1,5202558
I can not use WITH clause as I need to have is as correlated subquery
SELECT EXTRACT (VALUE (d), '//row/text()').getstringval ()
FROM (SELECT XMLTYPE ( '<rows><row>' || REPLACE (USERPROF.FIELD1, '|', '</row><row>') || '</row></rows>' ) AS xmlval FROM USERPROF WHERE FIELD1 IS NOT NULL ) x, TABLE (XMLSEQUENCE (EXTRACT (x.xmlval, '/rows/row'))) d
however this is converting entire string to multiple rows.
I tried playing with regexp and connect which is not helping me but fetching content of entire table by ignore where condition.
select regexp_substr(FIELD1,'[^|]+', 1, LEVEL) from USERPROF WHERE USERS_ID = 23502
connect by regexp_substr(FIELD1, '[^|]+', 1, level ) is not null;
Thanks in advance.

The SQL below:
with data as
(select '1|310492|1|12319|1|562548|1|5202558' as x from dual)
select fin from(
select 1+level-1 as occurrence
, instr(x,'|',1,1+level-1) as pos
, nvl(lead(instr(x,'|',1,1+level-1),1) over (order by 1+level-1)
, length(x))
as xxxx
, case when
nvl(lead(instr(x,'|',1,1+level-1),1) over (order by 1+level-1)
, length(x)) = length(x)
then instr(x,'|',1,1+level-1)
else
nvl(lag(instr(x,'|',1,1+level-1),1) over (order by 1+level-1),1) end as yyyy
, substr(x
,case when
nvl(lead(instr(x,'|',1,1+level-1),1) over (order by 1+level-1)
, length(x)) = length(x)
then instr(x,'|',1,1+level-1)
else
nvl(lag(instr(x,'|',1,1+level-1),1) over (order by 1+level-1),1) end
,nvl(lead(instr(x,'|',1,1+level-1),1) over (order by 1+level-1)
, length(x))
- case when
nvl(lead(instr(x,'|',1,1+level-1),1) over (order by 1+level-1)
, length(x)) = length(x)
then instr(x,'|',1,1+level-1)
else
nvl(lag(instr(x,'|',1,1+level-1),1) over (order by 1+level-1),1) end
) as fin
, length(x) as lastrw
from data
connect by level <= length(x) - length(replace(x, '|')) - 1
order by 1) x
where mod(occurrence,2) = 1 or xxxx = lastrw
Results in:
FIN
1|310492
|1|12319
|1|562548
|1|520255
Note that I'm just using the with clause to use the data you gave as an example.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Oracle REGEX_SUBSTR Not Honoring null values - regex

Related

Separating Text by Delimiter using regexp_subtr

Reg Exp don't select if more than one group matches (multiple XOR)

Use connect by by in REGEXP_SUBSTR without breaking result to multiple rows

PL/SQL split one to many rows

Oracle - how to convert string to row pair with out using WITH clause

Categories

Resources