RegEx in Oracle SQL, positive look ahead - regex

I have a string and I'd like to capture everything before "\Close_Out":
string: \fileshare\R and G\123456\Close_Out\Warranty Letter.pdf
The only solution I have come up with uses positive lookahead, this works when I test it on https://regex101.com/
(.*)(?=\\Close_Out)
But now I need to use it in an Oracle SQL statement :
select REGEXP_SUBSTR('\\fileshare\R and G\123456\Close_Out\Warranty Letter.pdf', '(.*)(?=\\Close_Out)') from dual
and it does not work since (I think) look ahead is not supported. Can someone assist with an alternative expression that will work in sql

If regular expressions isn't a must, then substr + instr does the job:
SQL> with test (col) as
2 (select '\fileshare\R and G\123456\Close_Out\Warranty Letter.pdf' from dual)
3 select substr(col, 1, instr(col,'\Close_Out') - 1) result
4 from test;
RESULT
-------------------------
\fileshare\R and G\123456
SQL>

Just for completness this REGEXP provides the result inclusive the \Close_Out
select REGEXP_SUBSTR('\\fileshare\R and G\123456\Close_Out\Warranty Letter.pdf', '.*\\Close_Out') reg from dual;
REG
------------------------------------
\\fileshare\R and G\123456\Close_Out
To get the string before it use a subexpression - a part enclosed in paretheses and reference it with the subexpression parameter = 1 (last parameter - see details in documentation).
select REGEXP_SUBSTR('\\fileshare\R and G\123456\Close_Out\Warranty Letter.pdf', '(.*)\\Close_Out', 1, 1, null, 1) reg from dual;
REG
--------------------------
\\fileshare\R and G\123456

Related

Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

I have below input and need mentioned output. How I can get it. I tried different pattern but could not get through it.
so in brief, any value having all three 1#2#3 parts(if it is present) or first value should be returned
2#9#;2#37#65 -> 2#37#65
2#9#;2#37#65;2#37# -> 2#37#65
2#9#;2#37#65;2#37#;2#37#56 -> 2#37#65 or 2#37#56
2#37#65;2#99 -> 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 -> 3#37#65 or 3#37#36 or 2#37#56
2#37#;2#99# -> 2#37 or 2#99# ( in this case any value)
I tried few patterns and other pattern but no help.
regexp_substr('2#9#;2#37#65;2#37#','#[^;]+',1)
SUBSTR(REGEXP_SUBSTR(SUBSTR(uo_filiere,1,INSTR(uo_filiere,';',1)-1), '#[^#]+$'),2)
You can use a REGEXP_REPLACE here:
REGEXP_REPLACE(uo_filiere, '^(.*;)?([0-9]+(#[0-9]+){2,}).*|^([^;]+).*', '\2\4')
See the regexp demo
Details:
^ - start of string
(.*;)? - an optional Group 1 capturing any text and then a ;
([0-9]+(#[0-9]+){2,}) - Group 2 (\2): one or more digits, and then two or more occurrences of # followed with one or more digits
.* - the rest of the string
| - or
^([^;]+).* - start of string, Group 4 capturing one or more chars other than ; and then any text till end of string.
The replacement is Group 2 + Group 4 values.
Here is a simple-minded way to solve this. It may prove more efficient than other approaches, given the particular nature of the problem.
First, use a regular expression to find the first token that has all three parts. This part of the solution should be the most efficient approach for those strings that do have a three-part token, and it performs work that must be performed on all input strings in any case.
In the second part, wrap within nvl - if no three-part token is found, select the first token regardless of how many parts are present. This part uses only substr and instr in a trivial manner, so it should be very fast too.
Here's the query, run on a few more sample inputs to test those cases too.
with
sample_data (uo_filiere) as (
select '2#9#;2#37#65' from dual union all
select '2#9#;2#37#65;2#37#' from dual union all
select '2#9#;2#37#65;2#37#;2#37#56' from dual union all
select '2#37#65;2#99' from dual union all
select '3#9#;3#37#65;3#37#36;2#37#56' from dual union all
select '2#37#;2#99#' from dual union all
select '1#22#333' from dual union all
select '33#444#' from dual
)
select uo_filiere,
nvl(regexp_substr(uo_filiere, '(;|^)(([^;#]+#){2}[^;]+)', 1, 1, null, 2)
, substr(uo_filiere, 1, instr(uo_filiere || ';', ';') - 1)
) as first_value
from sample_data
;
UO_FILIERE FIRST_VALUE
---------------------------- ----------------------------
2#9#;2#37#65 2#37#65
2#9#;2#37#65;2#37# 2#37#65
2#9#;2#37#65;2#37#;2#37#56 2#37#65
2#37#65;2#99 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 3#37#65
2#37#;2#99# 2#37#
1#22#333 1#22#333
33#444# 33#444#

Regular expression to get last part of a string before a character

I have this string on a single column of a single row on an oracle table:
(test-1#gmail.com-1234567)
(testAAAcccc#gmail.com-7654321)
..
Above it's a single big string.
I need a regular expression to extract all the occurrences (could be 1 or more, 2 in above example) of the 7 numbers above, so the results should be:
1234567
7654321
I'm trying to to that with various regular expression or oracle functions, I'm not able to get both the occurrences.
Could you please help me?
If you need exactly regular expression:
select regexp_substr('(test-1#gmail.com-1234567)', '\d{7}' ) from dual
To find all occurences:
select *
from t,
lateral(select level occurence_number, regexp_substr(str, '(\d{7})',1,level ) digits7
from dual
connect by level<=regexp_count(str, '(\d{7})' )
);
or test query with test data (you can run it as-is to se how it works):
with t(str) as (
select '(test-1#gmail.com-1234567)' from dual union all
select '(testAAAcccc#gmail.com-7654321)' from dual union all
select '7654321 1234567 2345678' from dual
)
select *
from t,
lateral(select level occurence_number, regexp_substr(str, '(\d{7})',1,level ) digits7
from dual
connect by level<=regexp_count(str, '(\d{7})' )
);
If you are looking to extract only numbers, if you can do that by:
Select SUBSTR(column, INSTR(column,'-', -1) + 1)
from dual;
#It is fetching everything from column after dash(-).
If you have paranthesis, if you can replace it with a space and then TRIM:
Select TRIM(replace(SUBSTR(column, INSTR(column,'-', -1) + 1),')', ' '))
from dual;

Url regex oracle

I want to do something like this :
This is the link I want to replace. So I want only to keep the "textIwantToKeep" part :
http://mylink/aaa-bbb/textIwantToKeep
And I want this :
http://mySecondLink/ccc-ddd/textIwantToKeep
I want to use regular expression with Oracle SQL Developper. I think about to count the number of slash (4) and to split only the part before the 4th slash but it doesn't work..
Thank you for your help.
REGEXP_SUBSTR might be one option; \w+$ returns the last word (i.e. the one "anchored" to the end of the string):
SQL> with test (link) as
2 (select 'http://mylink/aaa-bbb/textIwantToKeep' from dual union all
3 select 'http://mySecondLink/ccc-ddd/textIwantToKeep' from dual
4 )
5 select link,
6 regexp_substr(link, '\w+$') result
7 from test;
LINK RESULT
------------------------------------------- --------------------
http://mylink/aaa-bbb/textIwantToKeep textIwantToKeep
http://mySecondLink/ccc-ddd/textIwantToKeep textIwantToKeep
SQL>
There could be other alternatives, but here is something that came to me quickly -
WITH main_table AS (
SELECT 'http://mylink/aaa-bbb/textIwantToKeep' AS original_string FROM dual
)
,
second_table AS (
SELECT 'http://mySecondLink/ccc-ddd/' AS my_second_link FROM dual
)
SELECT
second_table.my_second_link
|| regexp_substr(main_table.original_string, '[^/]+', 1, 4) AS final_string
FROM
main_table,
second_table;
Let me know if that works.

How do I convert this pcre regex to be used with Oracle's REGEXP_SUBSTR?

I have this pcre regular expression that I want to port to an Oracle-supported regex:
^.*pdf_(\w+-\w+).*$
Is designed to match and only what's bolded:
roundBox indent pdf_placement
pdf_grade
indent pdf_placement1 roundBox
What is the equivalent expression in Oracle's regex syntax?
Edit:
I tried what was suggested by sln in the comments:
SELECT REGEXP_SUBSTR(class, '^.*pdf_(\w+(?:-\w+)*).*$') FROM ...
And all I'm getting is the entire value returned, not just the match:
roundBox indent pdf_placement
instead of
placement
The expression I ended up going with was:
pdf_(\w+(?-\w*)*)
In full, the SELECT clause looked like this:
SELECT REGEXP_SUBSTR(class, 'pdf_(\w+(-\w*)*)', 1, 1, 'i', 1) FROM ...
You could take the approach of replacing what's unwanted:
SQL> with t (txt) as (
2 select 'roundBox indent pdf_placement' from dual union all
3 select 'PDF_grade' from dual union all
4 select 'indent pdf_placement1 roundBox' from dual
5 ) -- end of sample data
6 select regexp_replace(txt, '^.*pdf_(\w+).*$', '\1', 1, 0, 'i')
7 from t;
REGEXP_REPLACE(TXT,'^.*PDF_(\W
--------------------------------------------------------------------------------
placement
grade
placement1
I used the parameter 'i' to make it case insensitive and work with capital letters PDF as well. Feel free to play with it as needed.

PL/SQL Oracle regular expression doesn't work for occurencce of zero

I have a problem in matching regular expression in Oracle PL/SQL.
To be more specific, the problem is that regex doesn't want to match any zero occurrence.
For example, I have something like:
select * from dual where regexp_like('', '[[:alpha:]]*');
and this doesn't work. But if I put space in this statement:
select * from dual where regexp_like(' ', '[[:alpha:]]*');
it works.
I want to have the first example running, so that person doesn't have to put 'space' for it to work.
Any help is appreciated, and thank you for your time.
T
For better or worse, empty strings in Oracle are treated as NULL:
SQL> select * from dual where '' like '%';
DUMMY
-----
Take that into account when querying with Oracle:
SQL> SELECT *
2 FROM dual
3 WHERE regexp_like('', '[[:alpha:]]*')
4 OR '' IS NULL;
DUMMY
-----
X
Does Oracle still treat empty strings as NULLs? And if so, does regexp_like with a NULL input source string return UNKNOWN? Both of these would be semi-reasonable, and a reason why your test does not work as you expected.