why '['||chr(128)||'-'||chr(255)||']' doesn't work

why '['||chr(128)||'-'||chr(255)||']' doesn't work - regex

I want to find ascii strings in oracle query which have symbols more than chr(127)
I see a lot of suggestions that '['||chr(128)||'-'||chr(255)||']' must work, but it doesn't
so next must return OK, but it doesn't
select 'OK' as result from dual where regexp_like('why Ä ?', '['||chr(128)||'-'||chr(255)||']')
and next must not return OK, but it does
select 'OK' as result from dual where regexp_like('why - ?', '['||chr(128)||'-'||chr(255)||']')
UPD: Sorry, capital A umlaut in my case is \xC4 (ISO 8859 Latin 1) , but here it turns into unicode chr(50052)

How about a different approach? Split string into characters and check whether maximum value is higher than 127.
For example:
SQL> with test (col) as
2 (select 'why Ä ?' from dual)
3 select substr(col, level, 1) one_character,
4 ascii(substr(col, level, 1)) ascii_of_one_character
5 from test
6 connect by level <= length(col);
ONE_ ASCII_OF_ONE_CHARACTER
---- ----------------------
w 119
h 104
y 121
32
Ä 50621 --> here it is!
32
? 63
7 rows selected.
SQL>
Now, move it into a subquery and fetch the result:
SQL> with test (col) as
2 (select 'why Ä ?' from dual)
3 select case when max(ascii_of_one_character) > 127 then 'OK'
4 else 'Not OK'
5 end result
6 from (select substr(col, level, 1) one_character,
7 ascii(substr(col, level, 1)) ascii_of_one_character
8 from test
9 connect by level <= length(col)
10 );
RESULT
------
OK
Or:
SQL> with test (col) as
2 (select 'why - ?' from dual)
3 select case when max(ascii_of_one_character) > 127 then 'OK'
4 else 'Not OK'
5 end result
6 from (select substr(col, level, 1) one_character,
7 ascii(substr(col, level, 1)) ascii_of_one_character
8 from test
9 connect by level <= length(col)
10 );
RESULT
------
Not OK
Millions of rows? Well, even for two rows queries I posted wouldn't work properly. Switch to
SQL> with test (col) as
2 (select 'why - ?' from dual union all
3 select 'why Ä ?' from dual
4 )
5 select col,
6 case when max(ascii_of_one_character) > 127 then 'OK'
7 else 'Not OK'
8 end result
9 from (select col,
10 substr(col, column_value, 1) one_character,
11 ascii(substr(col, column_value, 1)) ascii_of_one_character
12 from test cross join table(cast(multiset(select level from dual
13 connect by level <= length(col)
14 ) as sys.odcinumberlist))
15 )
16 group by col;
COL RESULT
-------- ------
why - ? Not OK
why Ä ? OK
SQL>
How will it behave? I don't know, try it and tell us. Note that for large data sets regular expressions might actually be slower than a simple substr option.
Yet another option: how about TRANSLATE? You don't have to split anything in that case. For example:
SQL> with test (col) as
2 (select 'why - ?' from dual union all
3 select 'why Ä ?' from dual
4 )
5 select col,
6 case when nvl(length(res), 0) > 0 then 'OK'
7 else 'Not OK'
8 end result
9 from (select col,
10 translate
11 (col,
12 '!"#$%&''()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ',
13 '!') res
14 from test
15 );
COL RESULT
-------- ------
why - ? Not OK
why Ä ? OK
SQL>

There is also another approach:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
xmlcast(
xmlquery(
'count(string-to-codepoints(.)[. > 127])'
passing t.str
returning content)
as int) cnt_over_127
from t;
Results:
STR RES CNT_OVER_127
---------- ------ ------------
why Ä ? Ok 1
why - ? Not ok 0
why - ? Ä Ok 1
why Not ok 0
As you can see I've used xmlquery() with string-to-codepoints xpath function, then filtered out codepoints >127 and returned their count().
Also you can use dump or utl_raw.cast_to_raw() functions, but it's a bit more complex and I'm a bit lazy to write full solutions using them.
But just small draft:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
dump(str,1016) dmp,
dump(str,1015) dmp,
utl_raw.cast_to_raw(str) as_row,
regexp_count(dump(str,1016)||',', '[89a-f][0-9a-f],') xs
from t;
Results:
STR RES DMP DMP AS_ROW XS
---------- ------ ------------------------------------------------------------------- ----------------------------------------------------------------------- -------------------- --
why Ä ? Ok Typ=1 Len=8 CharacterSet=AL32UTF8: 77,68,79,20,c3,84,20,3f Typ=1 Len=8 CharacterSet=AL32UTF8: 119,104,121,32,195,132,32,63 77687920C384203F 2
why - ? Not ok Typ=1 Len=7 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f Typ=1 Len=7 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63 776879202D203F 0
why - ? Ä Ok Typ=1 Len=10 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f,20,c3,84 Typ=1 Len=10 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63,32,195,132 776879202D203F20C384 2
why Not ok Typ=1 Len=3 CharacterSet=AL32UTF8: 77,68,79 Typ=1 Len=3 CharacterSet=AL32UTF8: 119,104,121 776879 0
Note: as that is unicode, so the first byte >127 means that is a multibyte character, so it counts 'Ä' twice - c3,84, - both bytes are higher than 127.

Don't know why you want to use codepoints instead of character sets, but you can invert the logic - use not 1-127 - [^1-127] :
DBFiddle
select 'OK' as result
from dual
where regexp_like('why Ä ?', '[^'||chr(1)||'-'||chr(127)||']');
select regexp_substr('why Ä ?', '[^'||chr(1)||'-'||chr(127)||']') x from dual;
And do not forget that some characters can be special characters like ] or even non-printable

Related

Oracle: want to split multiple tags into rows using regex [duplicate]

I know this has been answered to some degree with PHP and MYSQL, but I was wondering if someone could teach me the simplest approach to splitting a string (comma delimited) into multiple rows in Oracle 10g (preferably) and 11g.
The table is as follows:
Name | Project | Error
108 test Err1, Err2, Err3
109 test2 Err1
I want to create the following:
Name | Project | Error
108 Test Err1
108 Test Err2
108 Test Err3
109 Test2 Err1
I've seen a few potential solutions around stack, however they only accounted for a single column (being the comma delimited string). Any help would be greatly appreciated.

This may be an improved way (also with regexp and connect by):
with temp as
(
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
select distinct
t.name, t.project,
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as error
from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
order by name
EDIT:
Here is a simple (as in, "not in depth") explanation of the query.
length (regexp_replace(t.error, '[^,]+')) + 1 uses regexp_replace to erase anything that is not the delimiter (comma in this case) and length +1 to get how many elements (errors) are there.
The select level from dual connect by level <= (...) uses a hierarchical query to create a column with an increasing number of matches found, from 1 to the total number of errors.
Preview:
select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 as max
from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1
table(cast(multiset(.....) as sys.OdciNumberList)) does some casting of oracle types.
The cast(multiset(.....)) as sys.OdciNumberList transforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList.
The table() function transforms a collection into a resultset.
FROM without a join creates a cross join between your dataset and the multiset.
As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value").
Preview:
select * from
temp t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) uses the column_value as the nth_appearance/ocurrence parameter for regexp_substr.
You can add some other columns from your data set (t.name, t.project as an example) for easy visualization.
Some references to Oracle docs:
REGEXP_REPLACE
REGEXP_SUBSTR
Extensibility Constants, Types, and Mappings (OdciNumberList)
CAST (multiset)
Hierarchical Queries

regular expressions is a wonderful thing :)
with temp as (
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual
union all
select 109, 'test2', 'Err1' from dual
)
SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name

There is a huge difference between the below two:
splitting a single delimited string
splitting delimited strings for multiple rows in a table.
If you do not restrict the rows, then the CONNECT BY clause would produce multiple rows and will not give the desired output.
For single delimited string, look at Split single comma delimited string into rows
For splitting delimited strings in a table, look at Split comma delimited strings in a table
Apart from Regular Expressions, a few other alternatives are using:
XMLTable
MODEL clause
Setup
SQL> CREATE TABLE t (
2 ID NUMBER GENERATED ALWAYS AS IDENTITY,
3 text VARCHAR2(100)
4 );
Table created.
SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');
1 row created.
SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');
1 row created.
SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');
1 row created.
SQL> COMMIT;
Commit complete.
SQL>
SQL> SELECT * FROM t;
ID TEXT
---------- ----------------------------------------------
1 word1, word2, word3
2 word4, word5, word6
3 word7, word8, word9
SQL>
Using XMLTABLE:
SQL> SELECT id,
2 trim(COLUMN_VALUE) text
3 FROM t,
4 xmltable(('"'
5 || REPLACE(text, ',', '","')
6 || '"'))
7 /
ID TEXT
---------- ------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9
9 rows selected.
SQL>
Using MODEL clause:
SQL> WITH
2 model_param AS
3 (
4 SELECT id,
5 text AS orig_str ,
6 ','
7 || text
8 || ',' AS mod_str ,
9 1 AS start_pos ,
10 Length(text) AS end_pos ,
11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
12 0 AS element_no ,
13 ROWNUM AS rn
14 FROM t )
15 SELECT id,
16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
17 FROM (
18 SELECT *
19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
20 DIMENSION BY (element_no)
21 MEASURES (start_pos, end_pos, element_count)
22 RULES ITERATE (2000)
23 UNTIL (ITERATION_NUMBER+1 = element_count[0])
24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
26 )
27 WHERE element_no != 0
28 ORDER BY mod_str ,
29 element_no
30 /
ID TEXT
---------- --------------------------------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9
9 rows selected.
SQL>

A couple of more examples of the same:
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/
Also, may use DBMS_UTILITY.comma_to_table & table_to_comma:
http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table

I would like to propose a different approach using a PIPELINED table function. It's somewhat similar to the technique of the XMLTABLE, except that you are providing your own custom function to split the character string:
-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/
-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
p_string VARCHAR2,
p_delimiter CHAR DEFAULT ','
)
RETURN typ_str2tbl_nst PIPELINED
AS
l_tmp VARCHAR2(32000) := p_string || p_delimiter;
l_pos NUMBER;
BEGIN
LOOP
l_pos := INSTR( l_tmp, p_delimiter );
EXIT WHEN NVL( l_pos, 0 ) = 0;
PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
l_tmp := SUBSTR( l_tmp, l_pos+1 );
END LOOP;
END str2tbl;
/
-- The problem solution
SELECT name,
project,
TRIM(COLUMN_VALUE) error
FROM t, TABLE(str2tbl(error));
Results:
NAME PROJECT ERROR
---------- ---------- --------------------
108 test Err1
108 test Err2
108 test Err3
109 test2 Err1
The problem with this type of approach is that often the optimizer won't know the cardinality of the table function and it will have to make a guess. This could be potentialy harmful to your execution plans, so this solution can be extended to provide execution statistics for the optimizer.
You can see this optimizer estimate by running an EXPLAIN PLAN on the query above:
Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16336 | 366K| 59 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 16336 | 366K| 59 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 8168 | 16336 | 28 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Even though the collection has only 3 values, the optimizer estimated 8168 rows for it (default value). This may seem irrelevant at first, but it may be enough for the optimizer to decide for a sub-optimal plan.
The solution is to use the optimizer extensions to provide statistics for the collection:
-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
dummy NUMBER,
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER,
STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo,
p_stats OUT SYS.ODCITabFuncStats,
p_args IN SYS.ODCIArgDescList,
p_string IN VARCHAR2,
p_delimiter IN CHAR DEFAULT ',' )
RETURN NUMBER
);
/
-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER
AS
BEGIN
p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
RETURN ODCIConst.SUCCESS;
END ODCIGetInterfaces;
-- This function is responsible for returning the cardinality estimate
STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo,
p_stats OUT SYS.ODCITabFuncStats,
p_args IN SYS.ODCIArgDescList,
p_string IN VARCHAR2,
p_delimiter IN CHAR DEFAULT ',' )
RETURN NUMBER
AS
BEGIN
-- I'm using basically half the string lenght as an estimator for its cardinality
p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
RETURN ODCIConst.SUCCESS;
END ODCIStatsTableFunction;
END;
/
-- Associate our optimizer extension with the PIPELINED function
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;
Testing the resulting execution plan:
Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 59 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 23 | 59 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 1 | 2 | 28 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
As you can see the cardinality on the plan above is not the 8196 guessed value anymore. It's still not correct because we are passing a column instead of a string literal to the function.
Some tweaking to the function code would be necessary to give a closer estimate in this particular case, but I think the overall concept is pretty much explained here.
The str2tbl function used in this answer was originally developed by Tom Kyte:
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:110612348061
The concept of associating statistics with object types can be further explored by reading this article:
http://www.oracle-developer.net/display.php?id=427
The technique described here works in 10g+.

Starting from Oracle 12c you could use JSON_TABLE and JSON_ARRAY:
CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION
SELECT 109,'test2','Err1' FROM dual;
And query:
SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
'$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;
Output:
┌──────┬─────────┬──────────────────┬──────┐
│ Name │ Project │ Error │ P │
├──────┼─────────┼──────────────────┼──────┤
│ 108 │ test │ Err1, Err2, Err3 │ Err1 │
│ 108 │ test │ Err1, Err2, Err3 │ Err2 │
│ 108 │ test │ Err1, Err2, Err3 │ Err3 │
│ 109 │ test2 │ Err1 │ Err1 │
└──────┴─────────┴──────────────────┴──────┘
db<>fiddle demo

REGEXP_COUNT wasn't added until Oracle 11i. Here's an Oracle 10g solution, adopted from Art's solution.
SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <=
LENGTH('Err1, Err2, Err3')
- LENGTH(REPLACE('Err1, Err2, Err3', ',', ''))
+ 1;

Here is an alternative implementation using XMLTABLE that allows for casting to different data types:
select
xmltab.txt
from xmltable(
'for $text in tokenize("a,b,c", ",") return $text'
columns
txt varchar2(4000) path '.'
) xmltab
;
... or if your delimited strings are stored in one or more rows of a table:
select
xmltab.txt
from (
select 'a;b;c' inpt from dual union all
select 'd;e;f' from dual
) base
inner join xmltable(
'for $text in tokenize($input, ";") return $text'
passing base.inpt as "input"
columns
txt varchar2(4000) path '.'
) xmltab
on 1=1
;

I had the same problem, and xmltable helped me:
SELECT id, trim(COLUMN_VALUE) text
FROM t, xmltable(('"' || REPLACE(text, ',', '","') || '"'))

I'd like to add another method. This one uses recursive querys, something I haven't seen in the other answers. It is supported by Oracle since 11gR2.
with cte0 as (
select phone_number x
from hr.employees
), cte1(xstr,xrest,xremoved) as (
select x, x, null
from cte0
union all
select xstr,
case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end,
case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end
from cte1
where xrest is not null
)
select xstr, xremoved from cte1
where xremoved is not null
order by xstr
It is quite flexible with the splitting character. Simply change it in the INSTR calls.

Without using connect by or regexp:
with mytable as (
select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual
union all
select 109, 'test2', 'Err1' from dual
)
,x as (
select name
,project
,','||error||',' error
from mytable
)
,iter as (SELECT rownum AS pos
FROM all_objects
)
select x.name,x.project
,SUBSTR(x.error
,INSTR(x.error, ',', 1, iter.pos) + 1
,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1
) error
from x, iter
where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1;

In Oracle 11g and later, you can use a recursive sub-query and simple string functions (which may be faster than regular expressions and correlated hierarchical sub-queries):
Oracle Setup:
CREATE TABLE table_name ( name, project, error ) as
select 108, 'test', 'Err1, Err2, Err3' from dual union all
select 109, 'test2', 'Err1' from dual;
Query:
WITH table_name_error_bounds ( name, project, error, start_pos, end_pos ) AS (
SELECT name,
project,
error,
1,
INSTR( error, ', ', 1 )
FROM table_name
UNION ALL
SELECT name,
project,
error,
end_pos + 2,
INSTR( error, ', ', end_pos + 2 )
FROM table_name_error_bounds
WHERE end_pos > 0
)
SELECT name,
project,
CASE end_pos
WHEN 0
THEN SUBSTR( error, start_pos )
ELSE SUBSTR( error, start_pos, end_pos - start_pos )
END AS error
FROM table_name_error_bounds
Output:
NAME | PROJECT | ERROR
---: | :------ | :----
108 | test | Err1
109 | test2 | Err1
108 | test | Err2
108 | test | Err3
db<>fiddle here

If you have Oracle APEX 5.1 or later installed, you can use the convenient APEX_STRING.split function, e.g.:
select q.Name, q.Project, s.column_value as Error
from mytable q,
APEX_STRING.split(q.Error, ',') s
The second parameter is the delimiter string. It also accepts a 3rd parameter to limit how many splits you want it to perform.
https://docs.oracle.com/en/database/oracle/application-express/20.1/aeapi/SPLIT-Function-Signature-1.html#GUID-3BE7FF37-E54F-4503-91B8-94F374E243E6

i had used the DBMS_UTILITY.comma_to _table function actually its working
the code as follows
declare
l_tablen BINARY_INTEGER;
l_tab DBMS_UTILITY.uncl_array;
cursor cur is select * from qwer;
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%notfound;
DBMS_UTILITY.comma_to_table (
list => rec.val,
tablen => l_tablen,
tab => l_tab);
FOR i IN 1 .. l_tablen LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i));
END LOOP;
end loop;
close cur;
end;
i had used my own table and column names

search substring in string

I'm looking for a regexp to get the correct output
For my example:
SELECT regexp_substr('brablcdefghig', '[^(bl)]+$') FROM dual;
I expect evth what is follow 'bl': cdefghig and it's OK,
But when I modify input and add 'b' charcter I've NULL in output why?
SELECT regexp_substr('brablcdefghigb', '[^(bl)]+$') FROM dual;

That's a simple substr + instr; you don't need regular expressions. If it has to be regexp, see lines #8 and 9
SQL> with test (id, col) as
2 (select 1, 'brablcdefghig' from dual union all
3 select 2, 'brablcdefghigb' from dual
4 )
5 select id,
6 col,
7 substr(col, instr(col, 'bl') + 2) result,
8 regexp_substr(replace(col, 'bl', '#'), '[^#]+$') result2,
9 regexp_replace(col, '.+bl', '') result3
10 from test;
ID COL RESULT RESULT2 RESULT3
---------- -------------- ---------- ---------- ----------
1 brablcdefghig cdefghig cdefghig cdefghig
2 brablcdefghigb cdefghigb cdefghigb cdefghigb
SQL>

finding date after specific text (oracle)

I am trying to extract a date after a specific string (we will call it IMP for now in a text field. It may appear upper/lower case and show as IMP 1/1/10 or IMP happened on 1/1/10 or IMP-1/1/10
So for example code below-
SELECT
REGEXP_SUBSTR('abc 3/4/16 blah blah IMP 3/7/16',
'(\d{1,2}/\d{1,2}/\d{2,4})') "REGEXP_SUBSTR" from dual
Will get the first date but not the one I want-
I have tried
'(IMP) (.|(a-z){1-10}) (\d{1,2}/\d{1,2}/\d{2,4})'
and other permutations.
SELECT
REGEXP_SUBSTR('abc 3/4/16 blah blah IMP 3/7/16',
'(\d{1,2}/\d{1,2}/\d{2,4})') "REGEXP_SUBSTR" from dual
If I include the (IMP) (.|(a-z){1-10}) I get null results, if I just use the
'(\d{1,2}/\d{1,2}/\d{2,4})') I get the first date that appears

Something like this?
SQL> with test (id, col) as
2 (select 1, 'abc 3/4/16 blah blah IMP 3/7/16' from dual union all
3 select 2, 'abc 3/4/16 blah blah 3/7/16 imp 2/8/15 xxx cc2' from dual union all
4 select 3, 'xxx 3/5/18 ccdd 234 imp happened on 5/8/19 some 23f' from dual union all
5 select 4, '3/10/18 bla bla imp-3/9/17 xfe 334 3/4/13 x' from dual
6 )
7 select id,
8 regexp_substr(substr(col, instr(lower(col), 'imp ') + 4), '\d+/\d+/\d+') result
9 from test;
ID RESULT
---------- --------------------
1 3/7/16
2 2/8/15
3 5/8/19
4 3/9/17
SQL>

Oracle - split the string by comma and get the last sub-str

I wanted to write an Oracle query to extract only the last sub-string of comma separated string like below:
DEST = "1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H"
I am interested in only G12. How do I get in the Oracle query?
Thanks

Try
REGEXP_SUBSTR('1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H', '[^,]+$')
But that will fetch G12 47H. You may consider
REGEXP_SUBSTR('1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H', '([^, ]+)( +[^,]*)?$', 1,1,NULL,1)
This will give G12.

A little bit of substringing (see comments within the code):
SQL> with test (dest) as
2 (select '1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H' from dual)
3 select
4 regexp_substr(dest, --> out of the DEST, give me ...
5 '\w+', --> ... the first word that begins right after ...
6 instr(dest, ',', 1, regexp_count(dest, ',')) + 1 --> ... postition of the last
7 ) result --> comma in the source string
8 from test;
RESULT
--------------------
G12
SQL>
Or, by splitting the comma-separated values into rows:
SQL> with test (dest) as
2 (select '1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H' from dual)
3 select regexp_substr(col, '\w+') result
4 from (select regexp_substr(dest, '[^,]+', 1, level) col, --> split column to rows
5 row_number() over (order by level desc) rn --> the last row will be RN = 1
6 from test
7 connect by level <= regexp_count(dest, ',') + 1
8 )
9 where rn = 1;
RESULT
--------------------
G12
SQL>

oracle regexp_like , specific number format without negative lookahead

I'm using regexp_like function in Oracle in order to match the following number format : xxxyxxx
I'm trying this :
select 1 "val"
from dual
where regexp_like('5553555','^(\d){3}(?!\1)\d\1{3}$')
but as I realized, negative lookahead is not supported in Oracle.
how to do it without negative lookahead?

Indeed, no look around is possible. Please note that you also have another issue: (\d){3} will match also 3 different digits. You would need (\d)\1\1 to match only three of the same digits.
For your particular case you could still use a regular expression. What I could think of is using a particular property: numbers with all the same 7 digits (xxxxxxx) will be dividable by 1111111.
With regexp_like and an additional modulo test:
with tbl(val) as (
select '5555555' from dual union
select '5553555' from dual union
select 'nothing' from dual
)
select val
from tbl
where regexp_like(val,'^(\d)\1\1\d\1{3}$') and mod(val, 1111111) > 0;
Or you could use two regexes:
with tbl(val) as (
select '5555555' from dual union
select '5553555' from dual union
select 'nothing' from dual
)
select val
from tbl
where regexp_like(val,'^(\d)\1\1\d\1{3}$') and not regexp_like(val,'^(\d)..\1');
Admittedly, neither is really elegant, and also not the most efficient. For more efficiency you should not use regular expressions.

Maybe oldfashioned SUBSTR might help. Something like this: split input string (COL) into two equal pieces, and compare whether they match. LEN is used to distinguish odd from even lengths and what to do with the second part of the string (i.e. which is its starting point).
A few examples:
SQL> WITH test (col) AS (SELECT '5554555' FROM DUAL),
2 len AS (SELECT LENGTH (col) len FROM test)
3 SELECT CASE
4 WHEN SUBSTR (col, 1, TRUNC (LENGTH (col) / 2)) =
5 SUBSTR (
6 col,
7 TRUNC (LENGTH (col) / 2)
8 + CASE WHEN MOD (l.len, 2) = 0 THEN 1 ELSE 2 END)
9 THEN
10 'OK'
11 ELSE
12 'Not OK'
13 END
14 result
15 FROM test t, len l;
RESULT
------
OK
SQL> l1
1* WITH test (col) AS (SELECT '5554555' FROM DUAL),
SQL> c/5554/2234/
1* WITH test (col) AS (SELECT '2234555' FROM DUAL),
SQL> /
RESULT
------
Not OK
SQL> l1
1* WITH test (col) AS (SELECT '2234555' FROM DUAL),
SQL> c/2234555/1221/
1* WITH test (col) AS (SELECT '1221' FROM DUAL),
SQL> /
RESULT
------
Not OK
SQL> l1
1* WITH test (col) AS (SELECT '1221' FROM DUAL),
SQL> c/1221/8888/
1* WITH test (col) AS (SELECT '8888' FROM DUAL),
SQL> /
RESULT
------
OK
SQL>

Use of the Trim Function to Trim Off The 'X' Values from 'Y'
This is just another approach to solving this subset of numeric palindrome problems.
If this were just a numeric palindrome, the undocumented function, reverse, could be used. Since we have a Y for the midvalue and we are testing to make sure that Y is not equal to X, the reverse function does not help us a lot here.
Borrowing on the use of subexpressions (aka character grouping) approach that Trincot uses, I just create a second subexpression for the midvalue and then I trim off the midvalue. If the trimmed expression is equal to original value, then we can be assured that Y != X.
SCOTT#db>WITH tst ( val ) AS (
2 SELECT '5555555' FROM DUAL UNION ALL
3 SELECT '12121' FROM DUAL UNION ALL
4 SELECT '5553555' FROM DUAL UNION ALL
5 SELECT 'amanaplanpanama' FROM DUAL UNION ALL
6 SELECT '' FROM DUAL
7 ) SELECT
8 val,
9 REGEXP_SUBSTR(val,'^(\d)\1\1(\d)\1{3}$',1,1,NULL,2) midval,
10 TRIM(BOTH REGEXP_SUBSTR(val,'^(\d)\1\1(\d)\1{3}$',1,1,NULL,2) FROM val) trim_midval
11 FROM
12 tst
13 WHERE
14 1 = 1
15 AND val = TRIM(BOTH regexp_substr(val,'^(\d)\1\1(\d)\1{3}$',1,1,NULL,2) FROM val);
-----------------------------
VAL MIDVAL TRIM_MIDVAL
5553555 3 5553555
-----------------------------
Littlefoot's non-regular expression solution appears to be the most straightforward here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

why '['||chr(128)||'-'||chr(255)||']' doesn't work - regex

Related

Oracle: want to split multiple tags into rows using regex [duplicate]

search substring in string

finding date after specific text (oracle)

Oracle - split the string by comma and get the last sub-str

oracle regexp_like , specific number format without negative lookahead

Categories

Resources