Oracle regex eliminate all duplicate words - regex

I would like to eliminate all duplicate words in a comma separated list.
I've tried with:
SELECT
REGEXP_REPLACE(
'1234,234,1234,1234,928,1234,123,1234,Abcd,1234,1234',
'([^,\w]+)(,[ ]*[\1])+') AS r
FROM dual
It should return
1234,234,928,123,Abcd
But in fact it returns
1234,234,234,234
Also tried with ([^,\w]+)(,[ ]*\1)+ but with '1234,1234,1234' it returns (null)
Also tried with
SELECT
REGEXP_REPLACE(
'1234,234,1234,1234,928,1234,123,1234,Abcd,1234,1234',
'([^,\w]+)(,[ ]*[\1])+', '\1') AS r
FROM dual
and following replacements, even '\1\2' but none of them is giving the desired result.
Please, any ideas?

I know this isn't exactly the method you were asking for, but it still achieves the same result:
WITH DATA AS
( SELECT '1234,234,1234,1234,928,1234,123,1234,Abcd,1234,1234' str FROM dual)
SELECT DISTINCT trim(regexp_substr(str, '[^,]+', 1, LEVEL)) str
FROM DATA
CONNECT BY instr(str, ',', 1, LEVEL - 1) > 0

Related

Regular expression to get last part of a string before a character

I have this string on a single column of a single row on an oracle table:
(test-1#gmail.com-1234567)
(testAAAcccc#gmail.com-7654321)
..
Above it's a single big string.
I need a regular expression to extract all the occurrences (could be 1 or more, 2 in above example) of the 7 numbers above, so the results should be:
1234567
7654321
I'm trying to to that with various regular expression or oracle functions, I'm not able to get both the occurrences.
Could you please help me?
If you need exactly regular expression:
select regexp_substr('(test-1#gmail.com-1234567)', '\d{7}' ) from dual
To find all occurences:
select *
from t,
lateral(select level occurence_number, regexp_substr(str, '(\d{7})',1,level ) digits7
from dual
connect by level<=regexp_count(str, '(\d{7})' )
);
or test query with test data (you can run it as-is to se how it works):
with t(str) as (
select '(test-1#gmail.com-1234567)' from dual union all
select '(testAAAcccc#gmail.com-7654321)' from dual union all
select '7654321 1234567 2345678' from dual
)
select *
from t,
lateral(select level occurence_number, regexp_substr(str, '(\d{7})',1,level ) digits7
from dual
connect by level<=regexp_count(str, '(\d{7})' )
);
If you are looking to extract only numbers, if you can do that by:
Select SUBSTR(column, INSTR(column,'-', -1) + 1)
from dual;
#It is fetching everything from column after dash(-).
If you have paranthesis, if you can replace it with a space and then TRIM:
Select TRIM(replace(SUBSTR(column, INSTR(column,'-', -1) + 1),')', ' '))
from dual;

RegExp not returning the desired results

I would expect the following code to return these two lines
88518-008
89274-021(08518-008,09274-021)
But it is only returning the second one, and I don't understand why, any help would be great!
WITH DATA AS
(
SELECT '88518-008,89274-021(08518-008,09274-021)' str
FROM dual
)
SELECT TRIM(REGEXP_SUBSTR(str, '[^,]+\((.+)\)|[^,]+(?![^\(]*\))+', 1, LEVEL)) str
FROM DATA
CONNECT BY REGEXP_INSTR(str, '\,(?![^\(]*\))', 1, LEVEL - 1) > 0
I have tested the regex online and they work as expected, and pulled the query from another example and tried replacing the values to match my needs.
You need the following regex:
'([^,]*),(.*\([^\)]+\))'
It starts by creating a Group 1, matching anything but comma, then a comma, then creates a Group 2, mathing anything up to left parenthes, then a left parenthes, then anything up to a right parenthes and finally a right parenthes.
That will give you the first value in Group 1, and the second value in Group 2.
Thanks for your help, The below returns the desired results
WITH DATA AS
(
SELECT 'word1, word2, word3, word4, word5, word6 (word7, word8)' str FROM dual
)
SELECT trim(regexp_substr(str, '[^,]+\((.+)\)|[^,]+(?![^\(]*\))+|[^,]+', 1, LEVEL)) str
FROM DATA
CONNECT BY REGEXP_INSTR(str, ',', 1, LEVEL) > 0

Using Oracle Regular Expression - Masking based on pattern

Cleaning up ,
With Oracle 11g PL/SQL, for below query, can I get the capture groups' positions (something like what Matcher.start() provides in java).
`select regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2') from dual`
Result should look like : "zone", 9(start of text "zone").
The bigger problem I was trying to solve is to mask data like account number using patterns like '^.....(.*)..$' (this pattern can vary depending on installation).
Will something like below work for you?
select regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2') expr
,instr('1234bankzone1234',regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2')) pos from dual
or more readable subquery like
select a.*, instr(a.value,a.expr) from (
select '1234bankzone1234' value,
regexp_replace('1234bankzone1234', '^..(.*)bank(zone).(.*)..$', '\2') expr from dual
) a
I couldn't find any direct equivalent of Matcher API like functionality and there is no way you can access the position group buffer in SQL.
1: Reverse pattern using this
regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace(
pattern, '(\()', '\1#') , '(\))', '#\1') , '\(#', ')#') , '\^\)#', '^') , '#\)\$', '$') , '#\)', '(#') , '#', '') , '\^([^\(]+\))', '^(\1') , '\(([^\)]+)\$', '(\1)$');
So, "^(.)..(.).$"; becomes "^.(..).(.)$";
2: Use this to bulk collect index and count of capture groups within both patterns
SELECT REGEXP_instr(pattern, '\(.*?\)+', 1, LEVEL) bulk collect into posCapture FROM v CONNECT BY LEVEL <= REGEXP_COUNT(pattern, '\(.*?\)');
3: Match both patterns against the text-to-be-masked. Merge them by the order found in step 2.
select regexp_replace(v_src, pattern, '\' || captureIndex) into tempStr from dual;

Oracle regular expression to return substring between specific start and end strings

I'm trying to do a regex match to return a substring between a start and end point.
Given the following table:
WITH test AS (SELECT 'ABCD_EFGH_THIS_IJKL' AS thetext FROM DUAL
UNION SELECT 'ABAB CDCD EG BCD' FROM DUAL)
SELECT *
FROM test
I would want to return the results:
'THIS'
NULL
So it would match THIS in the first string, and nothing in the second string.
For this is safe to assume that ABCD_EFGH preceeds the text i want to match, and _ follows the text I want to match.
Thanks for any help!
EDIT: This needs to work on 10g. Sorry for not making that clear turbanoff.
use REGEXP_SUBSTR with 11g
WITH test AS (SELECT 'ABCD_EFGH_THIS_IJKL' AS thetext FROM DUAL
UNION SELECT 'ABAB CDCD EG BCD' FROM DUAL)
SELECT REGEXP_SUBSTR( TEST.THETEXT, 'ABCD_EFGH_([^_]*).*', 1, 1, 'i', 1)
FROM test
Edit
This can be done without using regular expressions.
WITH test AS (SELECT 'ABCD_EFGH_THIS_IJKL' AS thetext FROM DUAL
UNION SELECT 'ABAB CDCD EG BCD' FROM DUAL)
select TEST.thetext
, instr(TEST.thetext, 'ABCD_EFGH_') + length('ABCD_EFGH_') START_POS
, instr(TEST.thetext, '_', length('ABCD_EFGH_') + 1) END_POS
, substr
(TEST.thetext
,instr(TEST.thetext, 'ABCD_EFGH_') + length('ABCD_EFGH_') --START_POS
,instr(TEST.thetext, '_', length('ABCD_EFGH_') + 1) - (instr(TEST.thetext, 'ABCD_EFGH_') + length('ABCD_EFGH_')) --END_POS - START_POS
) RESULT
FROM test

Last word in a sentence: In SQL (regular expressions possible?)

I need this to be done in Oracle SQL (10gR2). But I guess, I would rather put it plainly, any good, efficient algorithm is fine.
Given a line (or sentence, containing one or many words, English), how will you find the last word of the sentence?
Here is what I have tried in SQL. But, I would like to see an efficient way of doing this.
select reverse(substr(reverse(&p_word_in)
, 0
, instr(reverse(&p_word_in), ' ')
)
)
from dual;
The idea was to reverse the string, find the first occurring space, retrieve the substring and reverse the string. Is it quite efficient? Is a regular expression available? I am on Oracle 10g R2. But I dont mind seeing any attempt in other programming language, I wont mind writing a PL/SQL function if need be.
Update:
Jeffery Kemp has given a wonderful answer. This works perfectly.
Answer
SELECT SUBSTR(&sentence, INSTR(&sentence,' ',-1) + 1)
FROM dual
I reckon it's simpler with INSTR/SUBSTR:
WITH q AS (SELECT 'abc def ghi' AS sentence FROM DUAL)
SELECT SUBSTR(sentence, INSTR(sentence,' ',-1) + 1)
FROM q;
Not sure how it is performance wise, but this should do it:
select regexp_substr(&p_word_in, '\S+$') from dual;
I'm not sure if you can use a regex in oracle, but wouldn't
(\w+)\W*$
work?
This regex matches the last word on a line:
\w+$
And RegexBuddy gives this code for use in Oracle:
DECLARE
match VARCHAR2(255);
BEGIN
match := REGEXP_SUBSTR(subject, '[[:alnum:]]_+$', 1, 1, 'c');
END;
this leaves the punctuation but gets the final word
with datam as (
SELECT 'abc asdb.' A FROM DUAL UNION
select 'ipso factum' a from dual union
select 'ipso factum' a from dual union
SELECT 'ipso factum2' A FROM DUAL UNION
SELECT 'ipso factum!' A FROM DUAL UNION
SELECT 'ipso factum !' A FROM DUAL UNION
SELECT 'ipso factum/**//*/?.?' A FROM DUAL UNION
SELECT 'ipso factum ...??!?!**' A FROM DUAL UNION
select 'ipso factum ..d.../.>' a from dual
)
SELECT a,
--REGEXP_SUBSTR(A, '[[:alnum:]]_+$', 1, 1, 'c') , /** these are the other examples*/
--REGEXP_SUBSTR(A, '\S+$') , /** these are the other examples*/
regexp_substr(a, '[a-zA-Z]+[^a-zA-Z]*$')
from datam
This works too, even for non english words :
SELECT REGEXP_SUBSTR ('San maria Calle Cáceres Numéro 25 principal izquierda, España', '[^ .]+$') FROM DUAL;