I'm getting a text oracle enclosed between delimiters. If possible, please help in creating a Regex for the text. I've an example of text
12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||
Till now I'm only able to fetch:
||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!
using this (\|\|(.*))+([^\|\|]).
But I need this data to be separated from || and then split from !!. After which I need to save it into an array like this:
array[1]= (123,word1 ,word2, word3)
array[2]=(789,word4,word5 , word6)
array[3]=(2345 ,word7,word8, 890)
This one should work:
with v1 as
(
select '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' t from dual
)
select level -1 id, trim(',' from regexp_replace(regexp_substr(t,'[^\|]+',1,level),'!!',',')) array from v1
where level > 1
connect by level <= regexp_count(t,'\|\|');
Output:
ID ARRAY
---------- --------------------------
1 123,word1 ,word2, word3
2 789,word4,word5 , word6
3 2345 ,word7,word8, 890
And if number of parts is constant (4) and You want them in separate columns:
with v1 as
(
select '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' t from dual
), v2 as
(
select level -1 id, trim(',' from regexp_replace(regexp_substr(t,'[^\|]+',1,level),'!!',',')) array
from v1
where level > 1
connect by level <= regexp_count(t,'\|\|')
)
select id,
regexp_substr(array,'[^,]+',1,1) val1,
regexp_substr(array,'[^,]+',1,2) val2,
regexp_substr(array,'[^,]+',1,3) val3,
regexp_substr(array,'[^,]+',1,4) val4
from v2;
Output:
ID VAL1 VAL2 VAL3 VAL4
---------- ---------- ---------- ---------- ----------
1 123 word1 word2 word3
2 789 word4 word5 word6
3 2345 word7 word8 890
PLSQL STYLE:
declare
type t_text_array is table of varchar2(4000);
v_text_array t_text_array := t_text_array();
val varchar2(4000);
cursor c1 is
select '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' t from dual;
begin
open c1;
fetch c1 bulk collect into v_text_array;
for i in 1..v_text_array.count loop
for j in 2..regexp_count(v_text_array(i),'\|\|') loop
val := trim(',' from regexp_replace(regexp_substr(v_text_array(i),'[^\|]+',1,j),'!!',','));
for k in 1..regexp_count(val,',')+1 loop
--display to console or further process...
dbms_output.put_line(regexp_substr(val,'[^,]+',1,k));
end loop;
end loop;
end loop;
end;
/
The below one returns expected results:
with x as
(select '2322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' str
from dual),
y as (
select regexp_substr(str,'[^||]+[!!]*', 1, level) str from x
where level > 1
connect by regexp_substr(str, '[^||]+[!!]*', 1, level) is not null
)
select
regexp_replace (
regexp_replace (
regexp_replace(str, '^!!', '(') ,
'!!$', ')'),
'[ ]*!![ ]*', ',') str
from y
You need apply twice the split on delimiter as described here.
Finally get the values (word) flat again using LISTAGG and finalize with some string concatenation.
I'm providing a complete example with two input records, so it can scale for any number of your parsed lines.
You may need to adjust the T2table limiting the number of splits. Some special handling is additionally needed if you can have NULL values in your keyword.
The query - commented below
WITH t1 AS
(SELECT 1 id,
'12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!|| ' col
FROM dual
UNION ALL
SELECT 2 id,
'22222ACCCC12Y||!!567!!word21 !!word22!! word23!!||!!789!!word24!!word25 !! word26!!||!!2345 !!word27!!word28!! 890!!|| ' col
FROM dual
),
t2 AS
(SELECT rownum colnum
FROM dual
CONNECT BY level < 10
/* (max) number of columns */
),
t3 AS
(SELECT t1.id,
t2.colnum,
regexp_substr(t1.col,'[^|]+', 1, t2.colnum) col
FROM t1,
t2
WHERE regexp_substr(t1.col, '[^|]+', 1, t2.colnum) IS NOT NULL
),
first_split AS
( SELECT id, colnum, col FROM t3 WHERE col LIKE '%!!%'
),
second_split AS
(SELECT t1.id,
t1.colnum linenum,
t2.colnum,
regexp_substr(t1.col,'[^!]+', 1, t2.colnum) col
FROM first_split t1,
t2
WHERE regexp_substr(t1.col, '[^!]+', 1, t2.colnum) IS NOT NULL
),
agg_values AS
(SELECT id,
linenum,
LISTAGG(col, ',') WITHIN GROUP (
ORDER BY colnum) val_lst
FROM second_split
GROUP BY id,
linenum
)
SELECT id,
'array['
|| row_number() over (partition BY ID order by linenum)
|| ']= ('
||val_lst
||')' array_text
FROM agg_values
ORDER BY 1,2
Yields as requested
ID ARRAY_TEXT
1 array[1]= (123, word1, word2, word3)
1 array[2]= (789, word4, word5, word6)
1 array[3]= (2345, word7, word8, 890)
2 array[1]= (567, word21, word22, word23)
2 array[2]= (789, word24, word25, word26)
2 array[3]= (2345, word27, word28, 890)
This is the result of the first_split query. You break the data in lines.
ID COLNUM COL
---------- ---------- ------------------------------------------
1 2 !!123!!word1 !!word2!! word3!!
1 3 !!789!!word4!!word5 !! word6!!
1 4 !!2345 !!word7!!word8!! 890!!
2 2 !!567!!word21 !!word22!! word23!!
2 3 !!789!!word24!!word25 !! word26!!
2 4 !!2345 !!word27!!word28!! 890!!
The second_split query breaks the lines in word.
ID LINENUM COLNUM COL
---------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------
1 2 1 123
1 2 2 word1
1 2 3 word2
1 2 4 word3
1 3 1 789
1 3 2 word4
1 3 3 word5
.....
The rest is LISTAGG to get the csv keyword list and a ROW_NUMBER function to get nice sequential array_ids
If you want to extract the values in separate columns use PIVOT instead of LISTAGG. The drawback is that you must adjust the query for the actual number of the values.
WITH t1 AS
(SELECT 1 id,
'12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!|| ' col
FROM dual
UNION ALL
SELECT 2 id,
'22222ACCCC12Y||!!567!!word21 !!word22!! word23!!||!!789!!word24!!word25 !! word26!!||!!2345 !!word27!!word28!! 890!!|| ' col
FROM dual
),
t2 AS
(SELECT rownum colnum
FROM dual
CONNECT BY level < 10
/* (max) number of columns */
),
t3 AS
(SELECT t1.id,
t2.colnum,
regexp_substr(t1.col,'[^|]+', 1, t2.colnum) col
FROM t1,
t2
WHERE regexp_substr(t1.col, '[^|]+', 1, t2.colnum) IS NOT NULL
),
first_split AS
( SELECT id, colnum, col FROM t3 WHERE col LIKE '%!!%'
),
--select * from first_split order by 1,2,3;
second_split AS
(SELECT t1.id,
t1.colnum linenum,
t2.colnum,
regexp_substr(t1.col,'[^!]+', 1, t2.colnum) col
FROM first_split t1,
t2
WHERE regexp_substr(t1.col, '[^!]+', 1, t2.colnum) IS NOT NULL
),
pivot_values AS
(SELECT *
FROM second_split PIVOT (MAX(col) col FOR (colnum) IN (1 AS "K1", 2 AS "K2", 3 AS "K3", 4 AS "K4"))
)
SELECT id,
row_number() over (partition BY ID order by linenum) AS array_id,
K1_COL,
K2_COL,
K3_COL,
K4_COL
FROM pivot_values
ORDER BY 1,2;
gives the relational view
ID ARRAY_ID K1_COL K2_COL K3_COL K4_COL
---------- ---------- -------- -------- -------- --------
1 1 123 word1 word2 word3
1 2 789 word4 word5 word6
1 3 2345 word7 word8 890
2 1 567 word21 word22 word23
2 2 789 word24 word25 word26
2 3 2345 word27 word28 890
Oracle Setup:
CREATE TABLE table_name ( id, value ) AS
SELECT 1, '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' FROM DUAL UNION ALL
SELECT 2, '12322ABCD124A||!!321!!word1a !!word2a!! word3a!!||!!987!!word4a!!word5a !! word6a!!||!!5432 !!word7a!!word8a!! 098!!||' FROM DUAL;
Query 1:
SELECT id,
grp_no,
CAST(
MULTISET(
SELECT REGEXP_SUBSTR( t.text, '!\s*([^!]+?)\s*!', 1, LEVEL, NULL, 1 )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.text, '!\s*([^!]+?)\s*!' )
)
AS SYS.ODCIVARCHAR2LIST
) AS words
FROM (
SELECT id,
COLUMN_VALUE AS grp_no,
REGEXP_SUBSTR( value, '\|([^|]+)\|', 1, COLUMN_VALUE, NULL, 1 ) AS text
FROM table_name t,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '\|([^|]+)\|' )
)
AS SYS.ODCINUMBERLIST
)
)
) t;
Results:
ID GRP_NO WORDS
---------- ---------- --------------------------------------------------------
1 1 SYS.ODCIVARCHAR2LIST('123','word1','word2','word3')
1 2 SYS.ODCIVARCHAR2LIST('789','word4','word5','word6')
1 3 SYS.ODCIVARCHAR2LIST('2345','word7','word8','890')
2 1 SYS.ODCIVARCHAR2LIST('321','word1a','word2a','word3a')
2 2 SYS.ODCIVARCHAR2LIST('987','word4a','word5a','word6a')
2 3 SYS.ODCIVARCHAR2LIST('5432','word7a','word8a','098')
Query 2:
SELECT id,
grp_no,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 1, NULL, 1 ) AS Word1,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 2, NULL, 1 ) AS Word2,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 3, NULL, 1 ) AS Word3,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 4, NULL, 1 ) AS Word4
FROM (
SELECT id,
COLUMN_VALUE AS grp_no,
REGEXP_SUBSTR( value, '\|([^|]+)\|', 1, COLUMN_VALUE, NULL, 1 ) AS text
FROM table_name t,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '\|([^|]+)\|' )
)
AS SYS.ODCINUMBERLIST
)
)
) t;
Results:
ID GRP_NO WORD1 WORD2 WORD3 WORD4
---- ------ ------- ------- ------- -------
1 1 123 word1 word2 word3
1 2 789 word4 word5 word6
1 3 2345 word7 word8 890
2 1 321 word1a word2a word3a
2 2 987 word4a word5a word6a
2 3 5432 word7a word8a 098
I have String as below.
select b.col1,a.col2,lower(a.col3) from table1 a inner join table2 b on a.col = b.col and a.col = b.col
inner join (select col1, col2, col3,col4 from tablename ) c on a.col1=b.col2
where
a.col = 'value'
Output need to be table1,table2 and tablename from above string. please let me know the regex to get the result.
Should be a simple one :-)
SQL> WITH DATA AS(
2 select q'[select b.col1,a.col2,lower(a.col3) from table1 a inner join table2 b on
3 a.col = b.col and a.col = b.col inner join (select col1, col2, col3,col4 from tablename )
4 c on a.col1=b.col2 where a.col = 'value']' str
5 FROM DUAL)
6 SELECT LISTAGG(TABLE_NAMES, ' , ') WITHIN GROUP (
7 ORDER BY val) table_names
8 FROM
9 (SELECT 1 val,
10 regexp_substr(str,'table[[:alnum:]]+',1,level) table_names
11 FROM DATA
12 CONNECT BY level <= regexp_count(str,'table')
13 )
14 /
TABLE_NAMES
--------------------------------------------------------------------------------
table1 , table2 , tablename
SQL>
Brief explanation, so that OP/even others might find it useful :
The REGEXP_SUBSTR looks for the words 'table', it could be followed
by a number or string like 1,2, name etc.
To find all such words, I used connect by level technique, but it
gives the output in different rows.
Finally, to put them in a single row as comma separated values, I
used LISTAGG.
Oh yes, and that q'[]' is the string literal technique.