Extracting strings using Oracle REGEXP_SUBSTR

Extracting strings using Oracle REGEXP_SUBSTR - regex

I am using REGEXP_SUBSTR in Oracle 11g and I am having difficulty trying to extract the following strings.
My query is:
SELECT regexp_substr('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', '[^CN=]*\,', 1, rownum) line
FROM dual
CONNECT BY LEVEL <= length('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,') -
length(REPLACE('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', ',', ''))
From this query, I am having issues trying to match on exact string 'CN=' as from this query, I need the output to appear as follows:
CN=aTIGERAdmin-Admin,
CN=D0902498,
CN=ea90045052,
CN=aTIGERCall-Admin,
And in this format, with the comma at the end.
The way I am doing it at the moment is chopping off the "CN=" but I actually require this part.

I think this will return the resultset you are looking for:
SELECT REGEXP_SUBSTR(d.s,'CN=.*?,', 1, ROWNUM) line
FROM (SELECT 'CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,'
AS s FROM dual) d
CONNECT BY LEVEL <= LENGTH(d.s) - LENGTH(REPLACE(d.s,',',''))
The regular expression trick used here is to specify the ? modifier (following the .*) to make the match "non-greedy". The default match (without the ? modifier) is "greedy" in that it will match as much of the string as possible. In your case, you want the match to end at the first comma found. The intent here is to match literal string 'CN=' followed by any number of characters (zero, one or more) up to the first comma encountered.
This will work in Oracle 10g as well as 11g.
In 11g, the REGEXP_COUNT function can replace your "count of comman" calculation of occurrences.
CONNECT BY LEVEL <= REGEXP_COUNT(d.s,'CN=.*?,')
(BTW... by using a subquery to return the literal string, the literal string only has to be specified once. That makes it much easier to change the string for testing, rather than having to change it in multiple places.)
Addendum:
I can confirm that the comma is included in the returned value. Sample output:
LINE
-----------------------
CN=aTIGERAdmin-Admin,
CN=D0902498,
CN=ea90045052,
CN=aTIGERCall-Admin,

I'm not an LDAP master, but will the regular expression CN=[^,]+ (C, then N, then equals sign, greedily followed by more than one non-comma) work for you?
Also, do you know about REGEXP_COUNT, new in 11g?
SQL> SELECT REGEXP_SUBSTR('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+', 1, ROWNUM) line
2 FROM dual
3 CONNECT BY LEVEL <= REGEXP_COUNT('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+')
4 /
LINE
----------------------------------------------------------------------------------------------------
CN=aTIGERAdmin-Admin
CN=D0902498
CN=ea90045052
CN=aTIGERCall-Admin
SQL>

Related

Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

I have below input and need mentioned output. How I can get it. I tried different pattern but could not get through it.
so in brief, any value having all three 1#2#3 parts(if it is present) or first value should be returned
2#9#;2#37#65 -> 2#37#65
2#9#;2#37#65;2#37# -> 2#37#65
2#9#;2#37#65;2#37#;2#37#56 -> 2#37#65 or 2#37#56
2#37#65;2#99 -> 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 -> 3#37#65 or 3#37#36 or 2#37#56
2#37#;2#99# -> 2#37 or 2#99# ( in this case any value)
I tried few patterns and other pattern but no help.
regexp_substr('2#9#;2#37#65;2#37#','#[^;]+',1)
SUBSTR(REGEXP_SUBSTR(SUBSTR(uo_filiere,1,INSTR(uo_filiere,';',1)-1), '#[^#]+$'),2)

You can use a REGEXP_REPLACE here:
REGEXP_REPLACE(uo_filiere, '^(.*;)?([0-9]+(#[0-9]+){2,}).*|^([^;]+).*', '\2\4')
See the regexp demo
Details:
^ - start of string
(.*;)? - an optional Group 1 capturing any text and then a ;
([0-9]+(#[0-9]+){2,}) - Group 2 (\2): one or more digits, and then two or more occurrences of # followed with one or more digits
.* - the rest of the string
| - or
^([^;]+).* - start of string, Group 4 capturing one or more chars other than ; and then any text till end of string.
The replacement is Group 2 + Group 4 values.

Here is a simple-minded way to solve this. It may prove more efficient than other approaches, given the particular nature of the problem.
First, use a regular expression to find the first token that has all three parts. This part of the solution should be the most efficient approach for those strings that do have a three-part token, and it performs work that must be performed on all input strings in any case.
In the second part, wrap within nvl - if no three-part token is found, select the first token regardless of how many parts are present. This part uses only substr and instr in a trivial manner, so it should be very fast too.
Here's the query, run on a few more sample inputs to test those cases too.
with
sample_data (uo_filiere) as (
select '2#9#;2#37#65' from dual union all
select '2#9#;2#37#65;2#37#' from dual union all
select '2#9#;2#37#65;2#37#;2#37#56' from dual union all
select '2#37#65;2#99' from dual union all
select '3#9#;3#37#65;3#37#36;2#37#56' from dual union all
select '2#37#;2#99#' from dual union all
select '1#22#333' from dual union all
select '33#444#' from dual
)
select uo_filiere,
nvl(regexp_substr(uo_filiere, '(;|^)(([^;#]+#){2}[^;]+)', 1, 1, null, 2)
, substr(uo_filiere, 1, instr(uo_filiere || ';', ';') - 1)
) as first_value
from sample_data
;
UO_FILIERE FIRST_VALUE
---------------------------- ----------------------------
2#9#;2#37#65 2#37#65
2#9#;2#37#65;2#37# 2#37#65
2#9#;2#37#65;2#37#;2#37#56 2#37#65
2#37#65;2#99 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 3#37#65
2#37#;2#99# 2#37#
1#22#333 1#22#333
33#444# 33#444#

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.

As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;

For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

Vertica REGEXP_SUBSTR use /g flag

I am trying to extract all occurrences of a word before '=' in a string, i tried to use this regex '/\w+(?=\=)/g' but it returns null, when i remove the first '/' and the last '/g' it returns only one occurrence that's why i need the global flag, any suggestions?

As Wiktor pointed out, by default, you only get the first string in a REGEXP_SUBSTR() call. But you can get the second, third, fourth, etc.
Embedded into SQL, you need to treat regular expressions differently from the way you would treat them in perl, for example. The pattern is just the pattern, modifiers go elsewhere, you can't use $n to get the n-th captured sub-expression, and you need to proceed in a specific way to get the n-th match of a pattern, etc.
The trick is to CROSS JOIN your queried table with an in-line created index table, consisting of as many consecutive integers as you expect occurrences of your pattern - and a few more for safety. And Vertica's REGEXP_SUBSTR() call allows for additional parameters to do that. See this example:
WITH
-- one exemplary input row; concatenating substrings for
-- readability
input(s) AS (
SELECT 'DRIVER={Vertica};COLUMNSASCHAR=1;CONNECTIONLOADBALANCE=True;'
||'CONNSETTINGS=set+search_path+to+public;DATABASE=sbx;'
||'LABEL=dbman;PORT=5433;PWD=;SERVERNAME=127.0.0.1;UID=dbadmin;'
)
,
-- an index table to CROSS JOIN with ... maybe you need more integers ...
loop_idx(i) AS (
SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
)
,
-- the query containing the REGEXP_SUBSTR() call
find_token AS (
SELECT
i -- the index from the in-line index table, needed
-- for ordering the outermost SELECT
, REGEXP_SUBSTR (
s -- the input string
, '(\w+)=' -- the pattern - a word followed by an equal sign; capture the word
, 1 -- start from pos 1
, i -- the i-th occurrence of the match
, '' -- no modifiers to regexp
, 1 -- the first and only sub-pattern captured
) AS token
FROM input CROSS JOIN loop_idx -- the CROSS JOIN with the in-line index table
)
-- the outermost query filtering the non-matches - the empty strings - away...
SELECT
token
FROM find_token
WHERE token <> ''
ORDER BY i
;
The result will be one row per found pattern:
token
DRIVER
COLUMNSASCHAR
CONNECTIONLOADBALANCE
CONNSETTINGS
DATABASE
LABEL
PORT
PWD
SERVERNAME
UID
You can do all sorts of things in modern SQL - but you need to stick to the SQL and to the relational paradigm - that's all ...
Happy playing ...
Marco

Find a string with or without space in oracle using like or regex

I have a string which contains specific 'winner code' which needs to be matched exactly but in the database some records contains spaces and extra characters within 'winners code' and if I use 'like operator' it only returns the matching criteria. I want to use one simplified query which can return all the records if it contains the winner code.Please find below my query and details
Winner code - أ4 ب3 ج10
Records with spaces - أ4 ب 3 ج 10
Records with extra character - (أ(4) ب(3) ج(10
My Query -
SELECT COLUMN_NAME,
FROM TABLE_NAME
WHERE
((COLUMN_NAME LIKE '%أ4%ب3%ج10%') or(COLUMN_NAME LIKE '%أ 4%ب 3%ج 10%'))
The above query returns with and without space data as its matching the criteria.
Thanks

If I correctly understand your need, you may try :
with test(str) as (
select '10X3Y4Z' from dual union all
select '10 X 3 Y 4 Z' from dual union all
select '(10)X(3)Y(4)Z' from dual union all
select '10#X3Y4 Z' from dual union all
select '10 # X3Y4Z' from dual )
select str
from test
where regexp_instr(str, '10[ |\)]{0,1}X[ |\(]{0,1}3[ |\)]{0,1}Y[ |\(]{0,1}4[ |\)]{0,1}Z') != 0
This matches your "winner code" ( I used different characters to simplify my test) even if the numbers are surrounded by '()' or a single space.
This can be re-written in a more compact way, but I believe this form is clear enough; it uses regular expressions like [ |\)]{0,1} to match a space or a parenthesis, with zero or one occurrence.

Oracle: Extract number from String

I've reviewed this question and I'm wondering my output seems to be a little skewed.
From my understanding the REGEXP_REPLACE method, takes a string that you want to replace content with, followed by a pattern to match, then anything that does not match that pattern is replaced with the substitution param.
I've written the following function to extract distance from a text field, in which a spatial query will be performed on the result.
CREATE OR REPLACE FUNCTION extract_distance
(
p_search_string VARCHAR2
)
RETURN VARCHAR2
IS
l_distance VARCHAR2(25);
BEGIN
SELECT REGEXP_REPLACE(UPPER(p_search_string), '(([0-9]{0,4}) ?MILES)', '')
INTO l_distance FROM SYS.DUAL;
RETURN l_distance;
END extract_distance;
When I run this in a block to test:
DECLARE
l_output VARCHAR2(25);
BEGIN
l_output := extract_distance('Stores selling COD4 in 400 Miles');
DBMS_OUTPUT.PUT_LINE(l_output);
END;
I'd expect the output 400 miles but in-fact I get Stores selling COD4 in. Where have I gone wrong?

"REGEXP_REPLACE extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string." from Oracle docu
You could use, e.g.,
SELECT REGEXP_REPLACE('Stores selling COD4 in 400 Miles', '^.*?(\d+ ?MILES).*$', '\1', 1, 0, 'i') FROM DUAL;
or alternatively
SELECT REGEXP_SUBSTR('Stores selling COD4 in 400 Miles', '(\d+ ?MILES)', 1, 1, 'i') FROM DUAL;

You'll want to use, regexp_substr which returns a substring that matches the regular expression.
REGEX_SUBSTR

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extracting strings using Oracle REGEXP_SUBSTR - regex

Related

Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

Oracle regex and replace

Vertica REGEXP_SUBSTR use /g flag

Find a string with or without space in oracle using like or regex

Oracle: Extract number from String

Categories

Resources