Oracle regexp cant seem to get it right

Oracle regexp cant seem to get it right - regex

I have some values like
CR-123456
ECR-12345
BCY-499494
134-ABC
ECW-ECR1233
CR-123344
I want to match all lines which do not start with ECR and the regex for doing so is ^((?!ECR)\w+) which seems to do what I want.
But then I want to replace the matched values which do not begin with ECR and replace them with ECR and i am blanked because the following doesn't seem to work
select regexp_replace('CR-123344','^((?!ECR)\w+)','ECR') from dual
Any ideas where i have gone wrong ?
I want the result to be
ECR-123456
ECR-12345
ECR-499494
ECR-ABC
ECR-ECR1233
ECR-123344

You don't absolutely need to use regex here, you can just use Oracle's base string functions.
SELECT
'ECR-' || SUBSTR(col,
INSTR(col, '-') + 1,
LENGTH(col) - INSTR(col, '-')) AS new_col
FROM yourTable
WHERE col NOT LIKE 'ECR-%'
The advantage of this approach is that it might run faster than a regex. The disadvantage is that the code is a bit less tidy, but if you understand how it works then this is the most important thing.

I would use substring and instr to replace everything before the dash, but here is your answer using regexp:
WITH aset
AS (SELECT 'CR-123456' a
FROM DUAL
UNION ALL
SELECT 'BCY-12345' a
FROM DUAL
UNION ALL
SELECT 'ECR-499494' a
FROM DUAL
UNION ALL
SELECT '134-ABC' a
FROM DUAL
UNION ALL
SELECT 'ECW-ECR1233' a
FROM DUAL
UNION ALL
SELECT 'CR-123344'
FROM DUAL)
SELECT a, regexp_replace(a, '^([^-]*)','ECR') b
FROM aset;
Results in
A,B
CR-123456,ECR-123456
BCY-12345,ECR-12345
ECR-499494,ECR-499494
134-ABC,ECR-ABC
ECW-ECR1233,ECR-ECR1233
CR-123344,ECR-123344

Looks like you are replacing characters before the '-' with ECR. Do you need to check if it does not match 'ECR' at all?
Because this will give you what you want, will it not?
select regexp_replace('CR-123344','(.*)-','ECR-')
from dual;

Related

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.

As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;

For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

regex to find alphanumeric combination of number+text only no special

have to find fix pattern of length 4 alphanumeric in input string
i have tried numeric only and alnum but cant figure out how i would only limit to char+num and no other special character or Numeric by itself
WITH tab AS (
SELECT '''1234,4565,1212,7658''' AS str FROM dual UNION ALL
SELECT '''abce,dddd,jdjd,rdrd,dder''' AS str FROM dual UNION ALL
SELECT '''123m,d565,1dd2,7fur' AS str FROM dual UNION ALL
SELECT '''1m#4,4u#5,1212,abcd' AS str FROM dual UNION ALL
SELECT '''abcd,456a,d212,7658''' AS str FROM dual UNION ALL
SELECT '''1234,4565,1212'',7658''' AS str FROM dual
)
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^['']([[:alnum:]]{4},)+([[:alnum:]]{4})['']$')
AND NOT REGEXP_LIKE(t.str ,'^['']([[:digit:]]{4},)+([[:digit:]]{4})['']$')
Expected
abce,dddd,jdjd,rdrd,dder
123m,d565,1dd2,7fur
Not expected
1m#4,4u#5,1212,abcd' --since this one has only 'abcd' valid but not others
abcd,456a,d212,7658 --since this one has '7658' which is invalid but others are
1234,4565,1212 --all numeric should be ignored

A regular expression similar to this will capture what you have outlined in words:
^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$
SELECT * FROM tab WHERE REGEXP_LIKE(str, '^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$', 'i');
However I can't work out your use of single quotes in your example, so you'll need to modify this to handle your quotes.
I would recommend updating your question to be more clear about quotes.
Also note I'm not explicitly familiar with PLSQL - written with MySQL in mind.

All you need in the second REGEXP is ignore rows that have characters that are not alphanumeric (except comma) and number groups with a size equivalent to 4. This is necesary because Oracle does not support positive lookahead according to this web site.
The solution that I propose is...
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^(([[:alnum:]]{4}),)*([[:alnum:]]{4})$')
AND NOT REGEXP_LIKE(t.str ,'[^[:alnum:],]|[0-9]{4}');

PLSQL select substr between Nth and Mth occurance of character

I'm sure there is a simple function for exactly this problem, but I can't seem to find it...
I have a string containing multiple slashes, for example an URL. Let's say I want to obtain the substring between the second and fourth occurance of the slash, if exists, else I want everything following the second slash or simply "" if it contains less than 2 slashes.
Hence: 'ab/cd/ef/gh/ij' should be selected as 'ef/gh' and 'abc/d' should be selected as ''.
What is the magical function/combination of functions I'm looking for? Tried to play around with substr and regexp_substr, but it got messy quite rapidly, without the desired result.

Apparently I wasn't searching hard enough. The function instr does the trick, hence in combination with substr:
SUBSTR(string, INSTR(string,'/',1,2) + 1, INSTR(string,'/',1,4) - INSTR(string,'/',1,2)-1)
Still looks kind of dirty to me though, creativity is more than welcome.

Give this a try. I suspect the regex's could be simpler but it meets your requirements. Note that the order in which you make the tests against the string in the case statement are very important, lest the str fall into the wrong test.
with tbl(rownbr, str) as (
select 1, 'ab/cd/ef/gh/ij/x/x/x' from dual union
select 2, 'aa/bb/cc' from dual union
select 3, 'gg/hh/ii/jj' from dual union
select 4, 'abc/d' from dual union
select 5, 'zz' from dual
)
select rownbr,
case
when regexp_count(str, '/') > 4 then
regexp_replace(str, '^.*?/.*?/(.*?/.*?)/.*$', '\1')
when regexp_count(str, '/') < 2 then
NULL
when regexp_count(str, '/') < 4 then
regexp_replace(str, '^.*?/.*?/(.*)$', '\1')
end result
from tbl;

How to check if a string matches multiple conditions in Oracle using regular expressions?

After struggling with regular expressions, I've came up with this pattern ^(ABC_)\w*(_USER[0-9]*)\w*(_MOD_)\w* that match this kind of word
If the string starts with ABC_ and contains _USER with any number following it, and also contains the word _MOD_ after that
Example of a matching strings:
ABC_sssss_USER0000000000_sssss_MOD_sssss
ABC_SCssB_USER0332_MOD_REG_SP
tested in this tool:
http://www.regexpal.com/
but I cant get it work in oracle sql
Here is my testing code:
SELECT
OBJECT_NAME,
REGEXP_INSTR(OBJECT_NAME, '^(ABC_)\w*(_USER[0-9]*)\w*(_MOD_)\w*') AS IS_MATCH
FROM
(
SELECT 'ABC_SCssB_USER0332_MOD_REG_SP' OBJECT_NAME FROM DUAL UNION
SELECT 'ABC_SCssB_USER0332_REG_SP' FROM DUAL UNION
SELECT 'SCssB_USER0332_MOD_REG_SP' FROM DUAL UNION
SELECT 'ABC_SCssB_MOD_REG_SP' FROM DUAL
)
Result:
ABC_SCssB_MOD_REG_SP 0
ABC_SCssB_USER0332_MOD_REG_SP 0
ABC_SCssB_USER0332_REG_SP 0
SCssB_USER0332_MOD_REG_SP 0
Expected Result:
ABC_SCssB_MOD_REG_SP 0
ABC_SCssB_USER0332_MOD_REG_SP 1
ABC_SCssB_USER0332_REG_SP 0
SCssB_USER0332_MOD_REG_SP 0
How can I achieve that in oracle ?

If regular expressions are not mandated you could do this, assuming you need 1 or more digits after '_USER':
select
object_name,
case when translate(OBJECT_NAME, '#0123456789', ' ##########')
like 'ABC\_%\_USER#%\_MOD\_%' escape '\'
then 1
else 0
end as is_match
from
(
select 'ABC_SCssB_USER0332_MOD_REG_SP' object_name from dual union
select 'ABC_SCssB_USER0332_REG_SP' from dual union
select 'SCssB_USER0332_MOD_REG_SP' from dual union
select 'ABC_SCssB_MOD_REG_SP' from dual
);
This runs a bit quicker than the regexp version for me (on 12.1.0.1.0) - about 75% of the time taken by the regexp version.
If there can be 0 or more digits after '_USER' then this will do:
select
object_name,
case when OBJECT_NAME like 'ABC\_%\_USER%\_MOD\_%' escape '\'
then 1
else 0
end as is_match
from
(
select 'ABC_SCssB_USER0332_MOD_REG_SP' object_name from dual union
select 'ABC_SCssB_USER0332_REG_SP' from dual union
select 'SCssB_USER0332_MOD_REG_SP' from dual union
select 'ABC_SCssB_MOD_REG_SP' from dual
);

Ok, so it turns out it will work if you change \w* to .*. It's still not clear what causes \w to fail, though.
I have once encountered non-latin ranges in character classes (like [A-z] but for Cyrillic, [А-я]) not working properly because of NLS_SORT settings. perhaps something similar is affecting \w?
#simsim, please post your exact database version and NLS settings, so that we could try to get to the root of the problem and make this question more useful to others.
EDIT:
The reason turns out to be much simpler - database version 10.1 is the culprit, regexp support was just added in 10g and \w is simply not supported in this version. My instance is 10.2, and "perl-influenced extensions" were only added in 10.2 - see this table for a full list of things that were added, and this link to see what's available in 10.1. Be aware that you also don't have support for non-greedy quantifiers (.*?, .+?) or similar character classes like \d.

Last word in a sentence: In SQL (regular expressions possible?)

I need this to be done in Oracle SQL (10gR2). But I guess, I would rather put it plainly, any good, efficient algorithm is fine.
Given a line (or sentence, containing one or many words, English), how will you find the last word of the sentence?
Here is what I have tried in SQL. But, I would like to see an efficient way of doing this.
select reverse(substr(reverse(&p_word_in)
, 0
, instr(reverse(&p_word_in), ' ')
)
)
from dual;
The idea was to reverse the string, find the first occurring space, retrieve the substring and reverse the string. Is it quite efficient? Is a regular expression available? I am on Oracle 10g R2. But I dont mind seeing any attempt in other programming language, I wont mind writing a PL/SQL function if need be.
Update:
Jeffery Kemp has given a wonderful answer. This works perfectly.
Answer
SELECT SUBSTR(&sentence, INSTR(&sentence,' ',-1) + 1)
FROM dual

I reckon it's simpler with INSTR/SUBSTR:
WITH q AS (SELECT 'abc def ghi' AS sentence FROM DUAL)
SELECT SUBSTR(sentence, INSTR(sentence,' ',-1) + 1)
FROM q;

Not sure how it is performance wise, but this should do it:
select regexp_substr(&p_word_in, '\S+$') from dual;

I'm not sure if you can use a regex in oracle, but wouldn't
(\w+)\W*$
work?

This regex matches the last word on a line:
\w+$
And RegexBuddy gives this code for use in Oracle:
DECLARE
match VARCHAR2(255);
BEGIN
match := REGEXP_SUBSTR(subject, '[[:alnum:]]_+$', 1, 1, 'c');
END;

this leaves the punctuation but gets the final word
with datam as (
SELECT 'abc asdb.' A FROM DUAL UNION
select 'ipso factum' a from dual union
select 'ipso factum' a from dual union
SELECT 'ipso factum2' A FROM DUAL UNION
SELECT 'ipso factum!' A FROM DUAL UNION
SELECT 'ipso factum !' A FROM DUAL UNION
SELECT 'ipso factum/**//*/?.?' A FROM DUAL UNION
SELECT 'ipso factum ...??!?!**' A FROM DUAL UNION
select 'ipso factum ..d.../.>' a from dual
)
SELECT a,
--REGEXP_SUBSTR(A, '[[:alnum:]]_+$', 1, 1, 'c') , /** these are the other examples*/
--REGEXP_SUBSTR(A, '\S+$') , /** these are the other examples*/
regexp_substr(a, '[a-zA-Z]+[^a-zA-Z]*$')
from datam

This works too, even for non english words :
SELECT REGEXP_SUBSTR ('San maria Calle Cáceres Numéro 25 principal izquierda, España', '[^ .]+$') FROM DUAL;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Oracle regexp cant seem to get it right - regex

Looks like you are replacing characters before the '-' with ECR. Do you need to check if it does not match 'ECR' at all? Because this will give you what you want, will it not? select regexp_replace('CR-123344','(.*)-','ECR-') from dual;

Related

Oracle regex and replace

regex to find alphanumeric combination of number+text only no special

PLSQL select substr between Nth and Mth occurance of character

How to check if a string matches multiple conditions in Oracle using regular expressions?

Last word in a sentence: In SQL (regular expressions possible?)

Categories

Resources