PLSQL select substr between Nth and Mth occurance of character

PLSQL select substr between Nth and Mth occurance of character - regex

I'm sure there is a simple function for exactly this problem, but I can't seem to find it...
I have a string containing multiple slashes, for example an URL. Let's say I want to obtain the substring between the second and fourth occurance of the slash, if exists, else I want everything following the second slash or simply "" if it contains less than 2 slashes.
Hence: 'ab/cd/ef/gh/ij' should be selected as 'ef/gh' and 'abc/d' should be selected as ''.
What is the magical function/combination of functions I'm looking for? Tried to play around with substr and regexp_substr, but it got messy quite rapidly, without the desired result.

Apparently I wasn't searching hard enough. The function instr does the trick, hence in combination with substr:
SUBSTR(string, INSTR(string,'/',1,2) + 1, INSTR(string,'/',1,4) - INSTR(string,'/',1,2)-1)
Still looks kind of dirty to me though, creativity is more than welcome.

Give this a try. I suspect the regex's could be simpler but it meets your requirements. Note that the order in which you make the tests against the string in the case statement are very important, lest the str fall into the wrong test.
with tbl(rownbr, str) as (
select 1, 'ab/cd/ef/gh/ij/x/x/x' from dual union
select 2, 'aa/bb/cc' from dual union
select 3, 'gg/hh/ii/jj' from dual union
select 4, 'abc/d' from dual union
select 5, 'zz' from dual
)
select rownbr,
case
when regexp_count(str, '/') > 4 then
regexp_replace(str, '^.*?/.*?/(.*?/.*?)/.*$', '\1')
when regexp_count(str, '/') < 2 then
NULL
when regexp_count(str, '/') < 4 then
regexp_replace(str, '^.*?/.*?/(.*)$', '\1')
end result
from tbl;

Related

Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

I have below input and need mentioned output. How I can get it. I tried different pattern but could not get through it.
so in brief, any value having all three 1#2#3 parts(if it is present) or first value should be returned
2#9#;2#37#65 -> 2#37#65
2#9#;2#37#65;2#37# -> 2#37#65
2#9#;2#37#65;2#37#;2#37#56 -> 2#37#65 or 2#37#56
2#37#65;2#99 -> 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 -> 3#37#65 or 3#37#36 or 2#37#56
2#37#;2#99# -> 2#37 or 2#99# ( in this case any value)
I tried few patterns and other pattern but no help.
regexp_substr('2#9#;2#37#65;2#37#','#[^;]+',1)
SUBSTR(REGEXP_SUBSTR(SUBSTR(uo_filiere,1,INSTR(uo_filiere,';',1)-1), '#[^#]+$'),2)

You can use a REGEXP_REPLACE here:
REGEXP_REPLACE(uo_filiere, '^(.*;)?([0-9]+(#[0-9]+){2,}).*|^([^;]+).*', '\2\4')
See the regexp demo
Details:
^ - start of string
(.*;)? - an optional Group 1 capturing any text and then a ;
([0-9]+(#[0-9]+){2,}) - Group 2 (\2): one or more digits, and then two or more occurrences of # followed with one or more digits
.* - the rest of the string
| - or
^([^;]+).* - start of string, Group 4 capturing one or more chars other than ; and then any text till end of string.
The replacement is Group 2 + Group 4 values.

Here is a simple-minded way to solve this. It may prove more efficient than other approaches, given the particular nature of the problem.
First, use a regular expression to find the first token that has all three parts. This part of the solution should be the most efficient approach for those strings that do have a three-part token, and it performs work that must be performed on all input strings in any case.
In the second part, wrap within nvl - if no three-part token is found, select the first token regardless of how many parts are present. This part uses only substr and instr in a trivial manner, so it should be very fast too.
Here's the query, run on a few more sample inputs to test those cases too.
with
sample_data (uo_filiere) as (
select '2#9#;2#37#65' from dual union all
select '2#9#;2#37#65;2#37#' from dual union all
select '2#9#;2#37#65;2#37#;2#37#56' from dual union all
select '2#37#65;2#99' from dual union all
select '3#9#;3#37#65;3#37#36;2#37#56' from dual union all
select '2#37#;2#99#' from dual union all
select '1#22#333' from dual union all
select '33#444#' from dual
)
select uo_filiere,
nvl(regexp_substr(uo_filiere, '(;|^)(([^;#]+#){2}[^;]+)', 1, 1, null, 2)
, substr(uo_filiere, 1, instr(uo_filiere || ';', ';') - 1)
) as first_value
from sample_data
;
UO_FILIERE FIRST_VALUE
---------------------------- ----------------------------
2#9#;2#37#65 2#37#65
2#9#;2#37#65;2#37# 2#37#65
2#9#;2#37#65;2#37#;2#37#56 2#37#65
2#37#65;2#99 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 3#37#65
2#37#;2#99# 2#37#
1#22#333 1#22#333
33#444# 33#444#

Url regex oracle

I want to do something like this :
This is the link I want to replace. So I want only to keep the "textIwantToKeep" part :
http://mylink/aaa-bbb/textIwantToKeep
And I want this :
http://mySecondLink/ccc-ddd/textIwantToKeep
I want to use regular expression with Oracle SQL Developper. I think about to count the number of slash (4) and to split only the part before the 4th slash but it doesn't work..
Thank you for your help.

REGEXP_SUBSTR might be one option; \w+$ returns the last word (i.e. the one "anchored" to the end of the string):
SQL> with test (link) as
2 (select 'http://mylink/aaa-bbb/textIwantToKeep' from dual union all
3 select 'http://mySecondLink/ccc-ddd/textIwantToKeep' from dual
4 )
5 select link,
6 regexp_substr(link, '\w+$') result
7 from test;
LINK RESULT
------------------------------------------- --------------------
http://mylink/aaa-bbb/textIwantToKeep textIwantToKeep
http://mySecondLink/ccc-ddd/textIwantToKeep textIwantToKeep
SQL>

There could be other alternatives, but here is something that came to me quickly -
WITH main_table AS (
SELECT 'http://mylink/aaa-bbb/textIwantToKeep' AS original_string FROM dual
)
,
second_table AS (
SELECT 'http://mySecondLink/ccc-ddd/' AS my_second_link FROM dual
)
SELECT
second_table.my_second_link
|| regexp_substr(main_table.original_string, '[^/]+', 1, 4) AS final_string
FROM
main_table,
second_table;
Let me know if that works.

Oracle SQL String Manipulation

My field contains short codes that I want to access, such as C-COR3.
The issue is some records have additional information (F and H with numbers). An example is C-COR3 F1.54H19, I only care about C-COR3. Anything after "F" I want to ignore.
Code below works, but only if I hard-code the full F1.54H19. I want to use wildcards to abstract this for other occurrences that have F and H info in the field. (Ex C-R3 F0.18H18 -> C-R3 or C-COR3 F0.23H8.5 -> C-COR3), note varying short code string lengths.
/* Translates C-COR3 F1.54H19 to C-COR3. */
select distinct SUBSTR(lud_code_short,1,INSTR(lud_code_short, 'F1.54H19')-2)
from rep_dba.mytable
I've read that SUBSTR does not allow wildcards, but have had no luck trying my hand at REGEXP_INSTR and REGEX_SUBSTR instead. Any help appreciated.

Assuming that the "code" is always the first continuous sequence of non-space characters (and that there are no leading spaces - if there are, that's easy to handle), you could do something like this. Note the str || ' ' in the call to instr() - that takes care of the case when the input string has no spaces in it to begin with. Also notice the last input - since there are no spaces anywhere, the output is the same as the input. (Showing that if the "code" is not always separated from the "additional information" by at least one space, the solution would not work.)
with
test_data (str) as (
select 'C-COR3 F14H2.5' from dual union all
select 'C-AB3' from dual union all
select null from dual union all
select 'C-AB2F14H2.5' from dual
)
select str, substr(str, 1, instr(str || ' ', ' ') - 1) as code
from test_data
;
STR CODE
-------------- --------------
C-COR3 F14H2.5 C-COR3
C-AB3 C-AB3
C-AB2F14H2.5 C-AB2F14H2.5

Try using regexp_replace within your query like below
SELECT
regexp_replace('C-COR3 F14H2.5', '(C-[[:alnum:]]+) [FH].*', '\1')
FROM dual;

Oracle regexp cant seem to get it right

I have some values like
CR-123456
ECR-12345
BCY-499494
134-ABC
ECW-ECR1233
CR-123344
I want to match all lines which do not start with ECR and the regex for doing so is ^((?!ECR)\w+) which seems to do what I want.
But then I want to replace the matched values which do not begin with ECR and replace them with ECR and i am blanked because the following doesn't seem to work
select regexp_replace('CR-123344','^((?!ECR)\w+)','ECR') from dual
Any ideas where i have gone wrong ?
I want the result to be
ECR-123456
ECR-12345
ECR-499494
ECR-ABC
ECR-ECR1233
ECR-123344

You don't absolutely need to use regex here, you can just use Oracle's base string functions.
SELECT
'ECR-' || SUBSTR(col,
INSTR(col, '-') + 1,
LENGTH(col) - INSTR(col, '-')) AS new_col
FROM yourTable
WHERE col NOT LIKE 'ECR-%'
The advantage of this approach is that it might run faster than a regex. The disadvantage is that the code is a bit less tidy, but if you understand how it works then this is the most important thing.

I would use substring and instr to replace everything before the dash, but here is your answer using regexp:
WITH aset
AS (SELECT 'CR-123456' a
FROM DUAL
UNION ALL
SELECT 'BCY-12345' a
FROM DUAL
UNION ALL
SELECT 'ECR-499494' a
FROM DUAL
UNION ALL
SELECT '134-ABC' a
FROM DUAL
UNION ALL
SELECT 'ECW-ECR1233' a
FROM DUAL
UNION ALL
SELECT 'CR-123344'
FROM DUAL)
SELECT a, regexp_replace(a, '^([^-]*)','ECR') b
FROM aset;
Results in
A,B
CR-123456,ECR-123456
BCY-12345,ECR-12345
ECR-499494,ECR-499494
134-ABC,ECR-ABC
ECW-ECR1233,ECR-ECR1233
CR-123344,ECR-123344

Looks like you are replacing characters before the '-' with ECR. Do you need to check if it does not match 'ECR' at all?
Because this will give you what you want, will it not?
select regexp_replace('CR-123344','(.*)-','ECR-')
from dual;

Last word in a sentence: In SQL (regular expressions possible?)

I need this to be done in Oracle SQL (10gR2). But I guess, I would rather put it plainly, any good, efficient algorithm is fine.
Given a line (or sentence, containing one or many words, English), how will you find the last word of the sentence?
Here is what I have tried in SQL. But, I would like to see an efficient way of doing this.
select reverse(substr(reverse(&p_word_in)
, 0
, instr(reverse(&p_word_in), ' ')
)
)
from dual;
The idea was to reverse the string, find the first occurring space, retrieve the substring and reverse the string. Is it quite efficient? Is a regular expression available? I am on Oracle 10g R2. But I dont mind seeing any attempt in other programming language, I wont mind writing a PL/SQL function if need be.
Update:
Jeffery Kemp has given a wonderful answer. This works perfectly.
Answer
SELECT SUBSTR(&sentence, INSTR(&sentence,' ',-1) + 1)
FROM dual

I reckon it's simpler with INSTR/SUBSTR:
WITH q AS (SELECT 'abc def ghi' AS sentence FROM DUAL)
SELECT SUBSTR(sentence, INSTR(sentence,' ',-1) + 1)
FROM q;

Not sure how it is performance wise, but this should do it:
select regexp_substr(&p_word_in, '\S+$') from dual;

I'm not sure if you can use a regex in oracle, but wouldn't
(\w+)\W*$
work?

This regex matches the last word on a line:
\w+$
And RegexBuddy gives this code for use in Oracle:
DECLARE
match VARCHAR2(255);
BEGIN
match := REGEXP_SUBSTR(subject, '[[:alnum:]]_+$', 1, 1, 'c');
END;

this leaves the punctuation but gets the final word
with datam as (
SELECT 'abc asdb.' A FROM DUAL UNION
select 'ipso factum' a from dual union
select 'ipso factum' a from dual union
SELECT 'ipso factum2' A FROM DUAL UNION
SELECT 'ipso factum!' A FROM DUAL UNION
SELECT 'ipso factum !' A FROM DUAL UNION
SELECT 'ipso factum/**//*/?.?' A FROM DUAL UNION
SELECT 'ipso factum ...??!?!**' A FROM DUAL UNION
select 'ipso factum ..d.../.>' a from dual
)
SELECT a,
--REGEXP_SUBSTR(A, '[[:alnum:]]_+$', 1, 1, 'c') , /** these are the other examples*/
--REGEXP_SUBSTR(A, '\S+$') , /** these are the other examples*/
regexp_substr(a, '[a-zA-Z]+[^a-zA-Z]*$')
from datam

This works too, even for non english words :
SELECT REGEXP_SUBSTR ('San maria Calle Cáceres Numéro 25 principal izquierda, España', '[^ .]+$') FROM DUAL;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PLSQL select substr between Nth and Mth occurance of character - regex

Apparently I wasn't searching hard enough. The function instr does the trick, hence in combination with substr: SUBSTR(string, INSTR(string,'/',1,2) + 1, INSTR(string,'/',1,4) - INSTR(string,'/',1,2)-1) Still looks kind of dirty to me though, creativity is more than welcome.

Related

Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

Url regex oracle

Oracle SQL String Manipulation

Oracle regexp cant seem to get it right

Last word in a sentence: In SQL (regular expressions possible?)

Categories

Resources