Fetch text between delimiter using regex in oracle - regex
I'm getting a text oracle enclosed between delimiters. If possible, please help in creating a Regex for the text. I've an example of text
12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||
Till now I'm only able to fetch:
||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!
using this (\|\|(.*))+([^\|\|]).
But I need this data to be separated from || and then split from !!. After which I need to save it into an array like this:
array[1]= (123,word1 ,word2, word3)
array[2]=(789,word4,word5 , word6)
array[3]=(2345 ,word7,word8, 890)
This one should work:
with v1 as
(
select '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' t from dual
)
select level -1 id, trim(',' from regexp_replace(regexp_substr(t,'[^\|]+',1,level),'!!',',')) array from v1
where level > 1
connect by level <= regexp_count(t,'\|\|');
Output:
ID ARRAY
---------- --------------------------
1 123,word1 ,word2, word3
2 789,word4,word5 , word6
3 2345 ,word7,word8, 890
And if number of parts is constant (4) and You want them in separate columns:
with v1 as
(
select '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' t from dual
), v2 as
(
select level -1 id, trim(',' from regexp_replace(regexp_substr(t,'[^\|]+',1,level),'!!',',')) array
from v1
where level > 1
connect by level <= regexp_count(t,'\|\|')
)
select id,
regexp_substr(array,'[^,]+',1,1) val1,
regexp_substr(array,'[^,]+',1,2) val2,
regexp_substr(array,'[^,]+',1,3) val3,
regexp_substr(array,'[^,]+',1,4) val4
from v2;
Output:
ID VAL1 VAL2 VAL3 VAL4
---------- ---------- ---------- ---------- ----------
1 123 word1 word2 word3
2 789 word4 word5 word6
3 2345 word7 word8 890
PLSQL STYLE:
declare
type t_text_array is table of varchar2(4000);
v_text_array t_text_array := t_text_array();
val varchar2(4000);
cursor c1 is
select '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' t from dual;
begin
open c1;
fetch c1 bulk collect into v_text_array;
for i in 1..v_text_array.count loop
for j in 2..regexp_count(v_text_array(i),'\|\|') loop
val := trim(',' from regexp_replace(regexp_substr(v_text_array(i),'[^\|]+',1,j),'!!',','));
for k in 1..regexp_count(val,',')+1 loop
--display to console or further process...
dbms_output.put_line(regexp_substr(val,'[^,]+',1,k));
end loop;
end loop;
end loop;
end;
/
The below one returns expected results:
with x as
(select '2322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' str
from dual),
y as (
select regexp_substr(str,'[^||]+[!!]*', 1, level) str from x
where level > 1
connect by regexp_substr(str, '[^||]+[!!]*', 1, level) is not null
)
select
regexp_replace (
regexp_replace (
regexp_replace(str, '^!!', '(') ,
'!!$', ')'),
'[ ]*!![ ]*', ',') str
from y
You need apply twice the split on delimiter as described here.
Finally get the values (word) flat again using LISTAGG and finalize with some string concatenation.
I'm providing a complete example with two input records, so it can scale for any number of your parsed lines.
You may need to adjust the T2table limiting the number of splits. Some special handling is additionally needed if you can have NULL values in your keyword.
The query - commented below
WITH t1 AS
(SELECT 1 id,
'12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!|| ' col
FROM dual
UNION ALL
SELECT 2 id,
'22222ACCCC12Y||!!567!!word21 !!word22!! word23!!||!!789!!word24!!word25 !! word26!!||!!2345 !!word27!!word28!! 890!!|| ' col
FROM dual
),
t2 AS
(SELECT rownum colnum
FROM dual
CONNECT BY level < 10
/* (max) number of columns */
),
t3 AS
(SELECT t1.id,
t2.colnum,
regexp_substr(t1.col,'[^|]+', 1, t2.colnum) col
FROM t1,
t2
WHERE regexp_substr(t1.col, '[^|]+', 1, t2.colnum) IS NOT NULL
),
first_split AS
( SELECT id, colnum, col FROM t3 WHERE col LIKE '%!!%'
),
second_split AS
(SELECT t1.id,
t1.colnum linenum,
t2.colnum,
regexp_substr(t1.col,'[^!]+', 1, t2.colnum) col
FROM first_split t1,
t2
WHERE regexp_substr(t1.col, '[^!]+', 1, t2.colnum) IS NOT NULL
),
agg_values AS
(SELECT id,
linenum,
LISTAGG(col, ',') WITHIN GROUP (
ORDER BY colnum) val_lst
FROM second_split
GROUP BY id,
linenum
)
SELECT id,
'array['
|| row_number() over (partition BY ID order by linenum)
|| ']= ('
||val_lst
||')' array_text
FROM agg_values
ORDER BY 1,2
Yields as requested
ID ARRAY_TEXT
1 array[1]= (123, word1, word2, word3)
1 array[2]= (789, word4, word5, word6)
1 array[3]= (2345, word7, word8, 890)
2 array[1]= (567, word21, word22, word23)
2 array[2]= (789, word24, word25, word26)
2 array[3]= (2345, word27, word28, 890)
This is the result of the first_split query. You break the data in lines.
ID COLNUM COL
---------- ---------- ------------------------------------------
1 2 !!123!!word1 !!word2!! word3!!
1 3 !!789!!word4!!word5 !! word6!!
1 4 !!2345 !!word7!!word8!! 890!!
2 2 !!567!!word21 !!word22!! word23!!
2 3 !!789!!word24!!word25 !! word26!!
2 4 !!2345 !!word27!!word28!! 890!!
The second_split query breaks the lines in word.
ID LINENUM COLNUM COL
---------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------
1 2 1 123
1 2 2 word1
1 2 3 word2
1 2 4 word3
1 3 1 789
1 3 2 word4
1 3 3 word5
.....
The rest is LISTAGG to get the csv keyword list and a ROW_NUMBER function to get nice sequential array_ids
If you want to extract the values in separate columns use PIVOT instead of LISTAGG. The drawback is that you must adjust the query for the actual number of the values.
WITH t1 AS
(SELECT 1 id,
'12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!|| ' col
FROM dual
UNION ALL
SELECT 2 id,
'22222ACCCC12Y||!!567!!word21 !!word22!! word23!!||!!789!!word24!!word25 !! word26!!||!!2345 !!word27!!word28!! 890!!|| ' col
FROM dual
),
t2 AS
(SELECT rownum colnum
FROM dual
CONNECT BY level < 10
/* (max) number of columns */
),
t3 AS
(SELECT t1.id,
t2.colnum,
regexp_substr(t1.col,'[^|]+', 1, t2.colnum) col
FROM t1,
t2
WHERE regexp_substr(t1.col, '[^|]+', 1, t2.colnum) IS NOT NULL
),
first_split AS
( SELECT id, colnum, col FROM t3 WHERE col LIKE '%!!%'
),
--select * from first_split order by 1,2,3;
second_split AS
(SELECT t1.id,
t1.colnum linenum,
t2.colnum,
regexp_substr(t1.col,'[^!]+', 1, t2.colnum) col
FROM first_split t1,
t2
WHERE regexp_substr(t1.col, '[^!]+', 1, t2.colnum) IS NOT NULL
),
pivot_values AS
(SELECT *
FROM second_split PIVOT (MAX(col) col FOR (colnum) IN (1 AS "K1", 2 AS "K2", 3 AS "K3", 4 AS "K4"))
)
SELECT id,
row_number() over (partition BY ID order by linenum) AS array_id,
K1_COL,
K2_COL,
K3_COL,
K4_COL
FROM pivot_values
ORDER BY 1,2;
gives the relational view
ID ARRAY_ID K1_COL K2_COL K3_COL K4_COL
---------- ---------- -------- -------- -------- --------
1 1 123 word1 word2 word3
1 2 789 word4 word5 word6
1 3 2345 word7 word8 890
2 1 567 word21 word22 word23
2 2 789 word24 word25 word26
2 3 2345 word27 word28 890
Oracle Setup:
CREATE TABLE table_name ( id, value ) AS
SELECT 1, '12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!||' FROM DUAL UNION ALL
SELECT 2, '12322ABCD124A||!!321!!word1a !!word2a!! word3a!!||!!987!!word4a!!word5a !! word6a!!||!!5432 !!word7a!!word8a!! 098!!||' FROM DUAL;
Query 1:
SELECT id,
grp_no,
CAST(
MULTISET(
SELECT REGEXP_SUBSTR( t.text, '!\s*([^!]+?)\s*!', 1, LEVEL, NULL, 1 )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.text, '!\s*([^!]+?)\s*!' )
)
AS SYS.ODCIVARCHAR2LIST
) AS words
FROM (
SELECT id,
COLUMN_VALUE AS grp_no,
REGEXP_SUBSTR( value, '\|([^|]+)\|', 1, COLUMN_VALUE, NULL, 1 ) AS text
FROM table_name t,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '\|([^|]+)\|' )
)
AS SYS.ODCINUMBERLIST
)
)
) t;
Results:
ID GRP_NO WORDS
---------- ---------- --------------------------------------------------------
1 1 SYS.ODCIVARCHAR2LIST('123','word1','word2','word3')
1 2 SYS.ODCIVARCHAR2LIST('789','word4','word5','word6')
1 3 SYS.ODCIVARCHAR2LIST('2345','word7','word8','890')
2 1 SYS.ODCIVARCHAR2LIST('321','word1a','word2a','word3a')
2 2 SYS.ODCIVARCHAR2LIST('987','word4a','word5a','word6a')
2 3 SYS.ODCIVARCHAR2LIST('5432','word7a','word8a','098')
Query 2:
SELECT id,
grp_no,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 1, NULL, 1 ) AS Word1,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 2, NULL, 1 ) AS Word2,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 3, NULL, 1 ) AS Word3,
REGEXP_SUBSTR( t.text, '!\s*([^!]+)!', 1, 4, NULL, 1 ) AS Word4
FROM (
SELECT id,
COLUMN_VALUE AS grp_no,
REGEXP_SUBSTR( value, '\|([^|]+)\|', 1, COLUMN_VALUE, NULL, 1 ) AS text
FROM table_name t,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '\|([^|]+)\|' )
)
AS SYS.ODCINUMBERLIST
)
)
) t;
Results:
ID GRP_NO WORD1 WORD2 WORD3 WORD4
---- ------ ------- ------- ------- -------
1 1 123 word1 word2 word3
1 2 789 word4 word5 word6
1 3 2345 word7 word8 890
2 1 321 word1a word2a word3a
2 2 987 word4a word5a word6a
2 3 5432 word7a word8a 098
Related
Oracle: want to split multiple tags into rows using regex [duplicate]
I know this has been answered to some degree with PHP and MYSQL, but I was wondering if someone could teach me the simplest approach to splitting a string (comma delimited) into multiple rows in Oracle 10g (preferably) and 11g. The table is as follows: Name | Project | Error 108 test Err1, Err2, Err3 109 test2 Err1 I want to create the following: Name | Project | Error 108 Test Err1 108 Test Err2 108 Test Err3 109 Test2 Err1 I've seen a few potential solutions around stack, however they only accounted for a single column (being the comma delimited string). Any help would be greatly appreciated.
This may be an improved way (also with regexp and connect by): with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) select distinct t.name, t.project, trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as error from temp t, table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels order by name EDIT: Here is a simple (as in, "not in depth") explanation of the query. length (regexp_replace(t.error, '[^,]+')) + 1 uses regexp_replace to erase anything that is not the delimiter (comma in this case) and length +1 to get how many elements (errors) are there. The select level from dual connect by level <= (...) uses a hierarchical query to create a column with an increasing number of matches found, from 1 to the total number of errors. Preview: select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 as max from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 table(cast(multiset(.....) as sys.OdciNumberList)) does some casting of oracle types. The cast(multiset(.....)) as sys.OdciNumberList transforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList. The table() function transforms a collection into a resultset. FROM without a join creates a cross join between your dataset and the multiset. As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value"). Preview: select * from temp t, table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) uses the column_value as the nth_appearance/ocurrence parameter for regexp_substr. You can add some other columns from your data set (t.name, t.project as an example) for easy visualization. Some references to Oracle docs: REGEXP_REPLACE REGEXP_SUBSTR Extensibility Constants, Types, and Mappings (OdciNumberList) CAST (multiset) Hierarchical Queries
regular expressions is a wonderful thing :) with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str FROM (SELECT Name, Project, Error str FROM temp) t CONNECT BY instr(str, ',', 1, level - 1) > 0 order by Name
There is a huge difference between the below two: splitting a single delimited string splitting delimited strings for multiple rows in a table. If you do not restrict the rows, then the CONNECT BY clause would produce multiple rows and will not give the desired output. For single delimited string, look at Split single comma delimited string into rows For splitting delimited strings in a table, look at Split comma delimited strings in a table Apart from Regular Expressions, a few other alternatives are using: XMLTable MODEL clause Setup SQL> CREATE TABLE t ( 2 ID NUMBER GENERATED ALWAYS AS IDENTITY, 3 text VARCHAR2(100) 4 ); Table created. SQL> SQL> INSERT INTO t (text) VALUES ('word1, word2, word3'); 1 row created. SQL> INSERT INTO t (text) VALUES ('word4, word5, word6'); 1 row created. SQL> INSERT INTO t (text) VALUES ('word7, word8, word9'); 1 row created. SQL> COMMIT; Commit complete. SQL> SQL> SELECT * FROM t; ID TEXT ---------- ---------------------------------------------- 1 word1, word2, word3 2 word4, word5, word6 3 word7, word8, word9 SQL> Using XMLTABLE: SQL> SELECT id, 2 trim(COLUMN_VALUE) text 3 FROM t, 4 xmltable(('"' 5 || REPLACE(text, ',', '","') 6 || '"')) 7 / ID TEXT ---------- ------------------------ 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word9 9 rows selected. SQL> Using MODEL clause: SQL> WITH 2 model_param AS 3 ( 4 SELECT id, 5 text AS orig_str , 6 ',' 7 || text 8 || ',' AS mod_str , 9 1 AS start_pos , 10 Length(text) AS end_pos , 11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count , 12 0 AS element_no , 13 ROWNUM AS rn 14 FROM t ) 15 SELECT id, 16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text 17 FROM ( 18 SELECT * 19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str) 20 DIMENSION BY (element_no) 21 MEASURES (start_pos, end_pos, element_count) 22 RULES ITERATE (2000) 23 UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1, 25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) ) 26 ) 27 WHERE element_no != 0 28 ORDER BY mod_str , 29 element_no 30 / ID TEXT ---------- -------------------------------------------------- 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word9 9 rows selected. SQL>
A couple of more examples of the same: SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1 / SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1 / Also, may use DBMS_UTILITY.comma_to_table & table_to_comma: http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table
I would like to propose a different approach using a PIPELINED table function. It's somewhat similar to the technique of the XMLTABLE, except that you are providing your own custom function to split the character string: -- Create a collection type to hold the results CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30); / -- Split the string according to the specified delimiter CREATE OR REPLACE FUNCTION str2tbl ( p_string VARCHAR2, p_delimiter CHAR DEFAULT ',' ) RETURN typ_str2tbl_nst PIPELINED AS l_tmp VARCHAR2(32000) := p_string || p_delimiter; l_pos NUMBER; BEGIN LOOP l_pos := INSTR( l_tmp, p_delimiter ); EXIT WHEN NVL( l_pos, 0 ) = 0; PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) ); l_tmp := SUBSTR( l_tmp, l_pos+1 ); END LOOP; END str2tbl; / -- The problem solution SELECT name, project, TRIM(COLUMN_VALUE) error FROM t, TABLE(str2tbl(error)); Results: NAME PROJECT ERROR ---------- ---------- -------------------- 108 test Err1 108 test Err2 108 test Err3 109 test2 Err1 The problem with this type of approach is that often the optimizer won't know the cardinality of the table function and it will have to make a guess. This could be potentialy harmful to your execution plans, so this solution can be extended to provide execution statistics for the optimizer. You can see this optimizer estimate by running an EXPLAIN PLAN on the query above: Execution Plan ---------------------------------------------------------- Plan hash value: 2402555806 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 16336 | 366K| 59 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 16336 | 366K| 59 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 | | 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 8168 | 16336 | 28 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Even though the collection has only 3 values, the optimizer estimated 8168 rows for it (default value). This may seem irrelevant at first, but it may be enough for the optimizer to decide for a sub-optimal plan. The solution is to use the optimizer extensions to provide statistics for the collection: -- Create the optimizer interface to the str2tbl function CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT ( dummy NUMBER, STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList ) RETURN NUMBER, STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo, p_stats OUT SYS.ODCITabFuncStats, p_args IN SYS.ODCIArgDescList, p_string IN VARCHAR2, p_delimiter IN CHAR DEFAULT ',' ) RETURN NUMBER ); / -- Optimizer interface implementation CREATE OR REPLACE TYPE BODY typ_str2tbl_stats AS STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList ) RETURN NUMBER AS BEGIN p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') ); RETURN ODCIConst.SUCCESS; END ODCIGetInterfaces; -- This function is responsible for returning the cardinality estimate STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo, p_stats OUT SYS.ODCITabFuncStats, p_args IN SYS.ODCIArgDescList, p_string IN VARCHAR2, p_delimiter IN CHAR DEFAULT ',' ) RETURN NUMBER AS BEGIN -- I'm using basically half the string lenght as an estimator for its cardinality p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) ); RETURN ODCIConst.SUCCESS; END ODCIStatsTableFunction; END; / -- Associate our optimizer extension with the PIPELINED function ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats; Testing the resulting execution plan: Execution Plan ---------------------------------------------------------- Plan hash value: 2402555806 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 59 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 1 | 23 | 59 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 | | 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 1 | 2 | 28 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- As you can see the cardinality on the plan above is not the 8196 guessed value anymore. It's still not correct because we are passing a column instead of a string literal to the function. Some tweaking to the function code would be necessary to give a closer estimate in this particular case, but I think the overall concept is pretty much explained here. The str2tbl function used in this answer was originally developed by Tom Kyte: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:110612348061 The concept of associating statistics with object types can be further explored by reading this article: http://www.oracle-developer.net/display.php?id=427 The technique described here works in 10g+.
Starting from Oracle 12c you could use JSON_TABLE and JSON_ARRAY: CREATE TABLE tab(Name, Project, Error) AS SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION SELECT 109,'test2','Err1' FROM dual; And query: SELECT * FROM tab t OUTER APPLY (SELECT TRIM(p) AS p FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'), '$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s; Output: ┌──────┬─────────┬──────────────────┬──────┐ │ Name │ Project │ Error │ P │ ├──────┼─────────┼──────────────────┼──────┤ │ 108 │ test │ Err1, Err2, Err3 │ Err1 │ │ 108 │ test │ Err1, Err2, Err3 │ Err2 │ │ 108 │ test │ Err1, Err2, Err3 │ Err3 │ │ 109 │ test2 │ Err1 │ Err1 │ └──────┴─────────┴──────────────────┴──────┘ db<>fiddle demo
REGEXP_COUNT wasn't added until Oracle 11i. Here's an Oracle 10g solution, adopted from Art's solution. SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= LENGTH('Err1, Err2, Err3') - LENGTH(REPLACE('Err1, Err2, Err3', ',', '')) + 1;
Here is an alternative implementation using XMLTABLE that allows for casting to different data types: select xmltab.txt from xmltable( 'for $text in tokenize("a,b,c", ",") return $text' columns txt varchar2(4000) path '.' ) xmltab ; ... or if your delimited strings are stored in one or more rows of a table: select xmltab.txt from ( select 'a;b;c' inpt from dual union all select 'd;e;f' from dual ) base inner join xmltable( 'for $text in tokenize($input, ";") return $text' passing base.inpt as "input" columns txt varchar2(4000) path '.' ) xmltab on 1=1 ;
I had the same problem, and xmltable helped me: SELECT id, trim(COLUMN_VALUE) text FROM t, xmltable(('"' || REPLACE(text, ',', '","') || '"'))
I'd like to add another method. This one uses recursive querys, something I haven't seen in the other answers. It is supported by Oracle since 11gR2. with cte0 as ( select phone_number x from hr.employees ), cte1(xstr,xrest,xremoved) as ( select x, x, null from cte0 union all select xstr, case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end, case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end from cte1 where xrest is not null ) select xstr, xremoved from cte1 where xremoved is not null order by xstr It is quite flexible with the splitting character. Simply change it in the INSTR calls.
Without using connect by or regexp: with mytable as ( select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual union all select 109, 'test2', 'Err1' from dual ) ,x as ( select name ,project ,','||error||',' error from mytable ) ,iter as (SELECT rownum AS pos FROM all_objects ) select x.name,x.project ,SUBSTR(x.error ,INSTR(x.error, ',', 1, iter.pos) + 1 ,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1 ) error from x, iter where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1;
In Oracle 11g and later, you can use a recursive sub-query and simple string functions (which may be faster than regular expressions and correlated hierarchical sub-queries): Oracle Setup: CREATE TABLE table_name ( name, project, error ) as select 108, 'test', 'Err1, Err2, Err3' from dual union all select 109, 'test2', 'Err1' from dual; Query: WITH table_name_error_bounds ( name, project, error, start_pos, end_pos ) AS ( SELECT name, project, error, 1, INSTR( error, ', ', 1 ) FROM table_name UNION ALL SELECT name, project, error, end_pos + 2, INSTR( error, ', ', end_pos + 2 ) FROM table_name_error_bounds WHERE end_pos > 0 ) SELECT name, project, CASE end_pos WHEN 0 THEN SUBSTR( error, start_pos ) ELSE SUBSTR( error, start_pos, end_pos - start_pos ) END AS error FROM table_name_error_bounds Output: NAME | PROJECT | ERROR ---: | :------ | :---- 108 | test | Err1 109 | test2 | Err1 108 | test | Err2 108 | test | Err3 db<>fiddle here
If you have Oracle APEX 5.1 or later installed, you can use the convenient APEX_STRING.split function, e.g.: select q.Name, q.Project, s.column_value as Error from mytable q, APEX_STRING.split(q.Error, ',') s The second parameter is the delimiter string. It also accepts a 3rd parameter to limit how many splits you want it to perform. https://docs.oracle.com/en/database/oracle/application-express/20.1/aeapi/SPLIT-Function-Signature-1.html#GUID-3BE7FF37-E54F-4503-91B8-94F374E243E6
i had used the DBMS_UTILITY.comma_to _table function actually its working the code as follows declare l_tablen BINARY_INTEGER; l_tab DBMS_UTILITY.uncl_array; cursor cur is select * from qwer; rec cur%rowtype; begin open cur; loop fetch cur into rec; exit when cur%notfound; DBMS_UTILITY.comma_to_table ( list => rec.val, tablen => l_tablen, tab => l_tab); FOR i IN 1 .. l_tablen LOOP DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i)); END LOOP; end loop; close cur; end; i had used my own table and column names
search substring in string
I'm looking for a regexp to get the correct output For my example: SELECT regexp_substr('brablcdefghig', '[^(bl)]+$') FROM dual; I expect evth what is follow 'bl': cdefghig and it's OK, But when I modify input and add 'b' charcter I've NULL in output why? SELECT regexp_substr('brablcdefghigb', '[^(bl)]+$') FROM dual;
That's a simple substr + instr; you don't need regular expressions. If it has to be regexp, see lines #8 and 9 SQL> with test (id, col) as 2 (select 1, 'brablcdefghig' from dual union all 3 select 2, 'brablcdefghigb' from dual 4 ) 5 select id, 6 col, 7 substr(col, instr(col, 'bl') + 2) result, 8 regexp_substr(replace(col, 'bl', '#'), '[^#]+$') result2, 9 regexp_replace(col, '.+bl', '') result3 10 from test; ID COL RESULT RESULT2 RESULT3 ---------- -------------- ---------- ---------- ---------- 1 brablcdefghig cdefghig cdefghig cdefghig 2 brablcdefghigb cdefghigb cdefghigb cdefghigb SQL>
Oracle - split the string by comma and get the last sub-str
I wanted to write an Oracle query to extract only the last sub-string of comma separated string like below: DEST = "1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H" I am interested in only G12. How do I get in the Oracle query? Thanks
Try REGEXP_SUBSTR('1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H', '[^,]+$') But that will fetch G12 47H. You may consider REGEXP_SUBSTR('1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H', '([^, ]+)( +[^,]*)?$', 1,1,NULL,1) This will give G12.
A little bit of substringing (see comments within the code): SQL> with test (dest) as 2 (select '1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H' from dual) 3 select 4 regexp_substr(dest, --> out of the DEST, give me ... 5 '\w+', --> ... the first word that begins right after ... 6 instr(dest, ',', 1, regexp_count(dest, ',')) + 1 --> ... postition of the last 7 ) result --> comma in the source string 8 from test; RESULT -------------------- G12 SQL> Or, by splitting the comma-separated values into rows: SQL> with test (dest) as 2 (select '1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H' from dual) 3 select regexp_substr(col, '\w+') result 4 from (select regexp_substr(dest, '[^,]+', 1, level) col, --> split column to rows 5 row_number() over (order by level desc) rn --> the last row will be RN = 1 6 from test 7 connect by level <= regexp_count(dest, ',') + 1 8 ) 9 where rn = 1; RESULT -------------------- G12 SQL>
Replace null value with previous value - SQL Server 2008 R2
Will post this question again with full code. Last try I didn't write it all which resulted in answers that I could not use. I have below query and want to replace the latest NULL-value with previous value for that currency. Sometimes there are many null-values on the same date and sometimes there is only one. I guess I have to do something with the left join on cteB? Any ideas? See result and desired result below query With cte as ( SELECT PositionDate, c.Currency, DepositLclCcy FROM [Static].[tbl_DateTable] dt CROSS JOIN (Values ('DKK'), ('EUR'), ('SEK')) as c (Currency) Left join ( SELECT BalanceDate, Currency, 'DepositLclCcy' = Sum(Case when Activity = 'Deposit' then BalanceCcy else 0 END) FROM [Position].[vw_InternalBank] Group By BalanceDate, Currency ) ib on dt.PositionDate = ib.BalanceDate and c.Currency = ib.Currency Where WeekDate = 'Yes') Select * From cte cteA Left join ( Select ... from Cte ) as cteB on ..... Order by cteA.PositionDate desc, cteA.Currency Current Result PositionDate Currency DepositLclCcy 2017-04-11 SEK 1 2017-04-11 DKK 3 2017-04-11 EUR 7 2017-04-10 SEK NULL 2017-04-10 DKK 3 2017-04-10 EUR 5 2017-04-07 SEK 5 2017-04-07 DKK 3 2017-04-07 EUR 5 Desired Result PositionDate Currency DepositLclCcy 2017-04-11 SEK 1 2017-04-11 DKK 3 2017-04-11 EUR 7 2017-04-10 SEK 5 2017-04-10 DKK 3 2017-04-10 EUR 5 2017-04-07 SEK 5 2017-04-07 DKK 3 2017-04-07 EUR 5
using outer apply() to get the previous value for DepositLclCcy, and replacing null values using coalesce(). with cte as ( select PositionDate , c.Currency , DepositLclCcy from [Static].[tbl_DateTable] dt cross join (values ('DKK') , ('EUR') , ('SEK')) as c(Currency) left join ( select BalanceDate , Currency , DepositLclCcy = Sum(case when Activity = 'Deposit' then BalanceCcy else 0 end) from [Position].[vw_InternalBank] group by BalanceDate, Currency ) ib on dt.PositionDate = ib.BalanceDate and c.Currency = ib.Currency where WeekDate = 'Yes' ) select cte.PositionDate , cte.Currency , DepositLclCcy = coalesce(cte.DepositLclCcy,x.DepositLclCcy) from cte outer apply ( select top 1 i.DepositLclCcy from cte as i where i.PositionDate < cte.PositionDate and i.Currency = cte.Currency order by i.PositionDate desc ) as x Skipping the initial left join and using outer apply() there instead: with cte as ( select dt.PositionDate , c.Currency , ib.DepositLclCcy from [Static].[tbl_DateTable] dt cross join (values ('DKK'), ('EUR'), ('SEK')) as c(Currency) outer apply ( select top 1 DepositLclCcy = sum(BalanceCcy) from [Position].[vw_InternalBank] as i where i.Activity = 'Deposit' and i.Currency = c.Currency and i.BalanceDate <= dt.PositionDate group by i.BalanceDate, i.Currency order by i.BalanceDate desc ) as ib where dt.WeekDate = 'Yes' ) select * from cte
Extract string from a large string oracle regexp
I have String as below. select b.col1,a.col2,lower(a.col3) from table1 a inner join table2 b on a.col = b.col and a.col = b.col inner join (select col1, col2, col3,col4 from tablename ) c on a.col1=b.col2 where a.col = 'value' Output need to be table1,table2 and tablename from above string. please let me know the regex to get the result.
Should be a simple one :-) SQL> WITH DATA AS( 2 select q'[select b.col1,a.col2,lower(a.col3) from table1 a inner join table2 b on 3 a.col = b.col and a.col = b.col inner join (select col1, col2, col3,col4 from tablename ) 4 c on a.col1=b.col2 where a.col = 'value']' str 5 FROM DUAL) 6 SELECT LISTAGG(TABLE_NAMES, ' , ') WITHIN GROUP ( 7 ORDER BY val) table_names 8 FROM 9 (SELECT 1 val, 10 regexp_substr(str,'table[[:alnum:]]+',1,level) table_names 11 FROM DATA 12 CONNECT BY level <= regexp_count(str,'table') 13 ) 14 / TABLE_NAMES -------------------------------------------------------------------------------- table1 , table2 , tablename SQL> Brief explanation, so that OP/even others might find it useful : The REGEXP_SUBSTR looks for the words 'table', it could be followed by a number or string like 1,2, name etc. To find all such words, I used connect by level technique, but it gives the output in different rows. Finally, to put them in a single row as comma separated values, I used LISTAGG. Oh yes, and that q'[]' is the string literal technique.