regex to find alphanumeric combination of number+text only no special - regex

have to find fix pattern of length 4 alphanumeric in input string
i have tried numeric only and alnum but cant figure out how i would only limit to char+num and no other special character or Numeric by itself
WITH tab AS (
SELECT '''1234,4565,1212,7658''' AS str FROM dual UNION ALL
SELECT '''abce,dddd,jdjd,rdrd,dder''' AS str FROM dual UNION ALL
SELECT '''123m,d565,1dd2,7fur' AS str FROM dual UNION ALL
SELECT '''1m#4,4u#5,1212,abcd' AS str FROM dual UNION ALL
SELECT '''abcd,456a,d212,7658''' AS str FROM dual UNION ALL
SELECT '''1234,4565,1212'',7658''' AS str FROM dual
)
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^['']([[:alnum:]]{4},)+([[:alnum:]]{4})['']$')
AND NOT REGEXP_LIKE(t.str ,'^['']([[:digit:]]{4},)+([[:digit:]]{4})['']$')
Expected
abce,dddd,jdjd,rdrd,dder
123m,d565,1dd2,7fur
Not expected
1m#4,4u#5,1212,abcd' --since this one has only 'abcd' valid but not others
abcd,456a,d212,7658 --since this one has '7658' which is invalid but others are
1234,4565,1212 --all numeric should be ignored

A regular expression similar to this will capture what you have outlined in words:
^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$
SELECT * FROM tab WHERE REGEXP_LIKE(str, '^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$', 'i');
However I can't work out your use of single quotes in your example, so you'll need to modify this to handle your quotes.
I would recommend updating your question to be more clear about quotes.
Also note I'm not explicitly familiar with PLSQL - written with MySQL in mind.

All you need in the second REGEXP is ignore rows that have characters that are not alphanumeric (except comma) and number groups with a size equivalent to 4. This is necesary because Oracle does not support positive lookahead according to this web site.
The solution that I propose is...
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^(([[:alnum:]]{4}),)*([[:alnum:]]{4})$')
AND NOT REGEXP_LIKE(t.str ,'[^[:alnum:],]|[0-9]{4}');

Related

Bigquery SQL Regex - Either start/end of string or not followed by/following any alphabet

I want to find if a string (already lowercase) contains an exact word. It can be anywhere within the string. For example, let's say the word is pot.
I initially used
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$")
But this is unable to catch cases where pot comes at the start/end of the string. In my understanding [^a-z] needs to match something other than alphabets and for start/end cases it is not able to find anything.
So, I added * to make sure that even if there is no alphabet it is ok.
regexp_contains(lower(string), "^.*[^a-z]*pot[^a-z]*.*$")
But then it match cases where pot is a part of another larger word for eg. honeypot etc.
I don't think this problem is restricted to Bigquery SQL's regexp_contains.
Consider below example
#standardSQL
with `project.dataset.table` as (
select 'pot asdf' sentence union all
select 'rtui pot' union all
select 'rtui pot dfgrert' union all
select 'sdpot potdf lkpotij' union all
select 'fjkhgsiejur sldkkr'
)
select sentence
from `project.dataset.table`
where regexp_contains(lower(sentence), r'\bpot\b')
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$|^pot[^a-z].*$|^.*[^a-z]pot$|^pot$")

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.
As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;
For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

Find a string with or without space in oracle using like or regex

I have a string which contains specific 'winner code' which needs to be matched exactly but in the database some records contains spaces and extra characters within 'winners code' and if I use 'like operator' it only returns the matching criteria. I want to use one simplified query which can return all the records if it contains the winner code.Please find below my query and details
Winner code - أ4 ب3 ج10
Records with spaces - أ4 ب 3 ج 10
Records with extra character - (أ(4)
ب(3)
ج(10
My Query -
SELECT COLUMN_NAME,
FROM TABLE_NAME
WHERE
((COLUMN_NAME LIKE '%أ4%ب3%ج10%') or(COLUMN_NAME LIKE '%أ 4%ب 3%ج 10%'))
The above query returns with and without space data as its matching the criteria.
Thanks
If I correctly understand your need, you may try :
with test(str) as (
select '10X3Y4Z' from dual union all
select '10 X 3 Y 4 Z' from dual union all
select '(10)X(3)Y(4)Z' from dual union all
select '10#X3Y4 Z' from dual union all
select '10 # X3Y4Z' from dual )
select str
from test
where regexp_instr(str, '10[ |\)]{0,1}X[ |\(]{0,1}3[ |\)]{0,1}Y[ |\(]{0,1}4[ |\)]{0,1}Z') != 0
This matches your "winner code" ( I used different characters to simplify my test) even if the numbers are surrounded by '()' or a single space.
This can be re-written in a more compact way, but I believe this form is clear enough; it uses regular expressions like [ |\)]{0,1} to match a space or a parenthesis, with zero or one occurrence.

Oracle regex to find the special character in name field

I'm trying to filter out the names which have special characters.
Requirement:
1) Filter the names which have characters other than a-zA-Z , space and forward slash(/).
Regex being tried out:
1) regexp_like (customername,'[^a-zA-Z[:space:]\/]'))
2) regexp_like (customername,'[^a-zA-Z \/]'))
The above two regex helps in finding the names with special characters like ? and dot(.)
For example:
LEAL/JO?O
FRANCO/DIVALDO Sr.
But I couldn't figure out why some names(listed below) with the allowed characters(a-zA-Z , space and forward slash(/)) also get retrieved.
For example:
ESTEVES/MARIA INES
PEREZ/JOSE
DUTRA SILVA/LIGIA
Please help to figure out the mistake in the regex being used.
Many thanks in advance!
Your regex #1 worked for me on 11g with the name data copied/pasted from this page. I wonder if you have non-printable control characters in the data? Try adding [:cntrl:] to the regex to catch control characters. P.S. the backslash is not needed before the slash when inside of a character class (square brackets).
SQL> with tbl(name) as (
select 'LEAL/JO?O' from dual union
select 'FRANCO/DIVALDO Sr.' from dual union
select 'ESTEVES/MARIA INES' from dual union
select 'PEREZ/JOSE' from dual union
select 'DUTRA SILVA/LIGIA' from dual
)
select *
from tbl
where regexp_like(name, '[^a-zA-Z[:space:][:cntrl:]/]');
NAME
------------------
FRANCO/DIVALDO Sr.
LEAL/JO?O
SQL>
If you can copy/paste this, run it and get the same results, then something is up with the data in your table. Have a look at the data in HEX which will bring to light a previously hidden character perhaps. Here's a simple example which shows the name "JOSE" in HEX. Using one of the numerous ASCII charts out there like http://www.asciitable.com/ you can see there are no hidden characters:
SQL> select 'JOSE' as chr, rawtohex('JOSE') as hex from dual;
CHR HEX
---- --------
JOSE 4A4F5345
SQL>
So, have a look at a name or two and see if you have any hidden characters. If not, I suspect a conflicting characterset issue maybe.
#gary_w has most of the bases well covered....
Here's my sql version of unix: cat -vet MyFile
select replace(regexp_replace(my_column,'[^[:print:]]', '!ACK!'),' ','.') as CAT_VET
from my_table
... all the non-printing characters become !ACK! and spaces become . You still need to determine what the characters actually ARE, but it's useful to find the looney-toon characters in your data.
Also, select dump(my_column) ... is another way to view the raw column values.

How to check if a string matches multiple conditions in Oracle using regular expressions?

After struggling with regular expressions, I've came up with this pattern ^(ABC_)\w*(_USER[0-9]*)\w*(_MOD_)\w* that match this kind of word
If the string starts with ABC_ and contains _USER with any number following it, and also contains the word _MOD_ after that
Example of a matching strings:
ABC_sssss_USER0000000000_sssss_MOD_sssss
ABC_SCssB_USER0332_MOD_REG_SP
tested in this tool:
http://www.regexpal.com/
but I cant get it work in oracle sql
Here is my testing code:
SELECT
OBJECT_NAME,
REGEXP_INSTR(OBJECT_NAME, '^(ABC_)\w*(_USER[0-9]*)\w*(_MOD_)\w*') AS IS_MATCH
FROM
(
SELECT 'ABC_SCssB_USER0332_MOD_REG_SP' OBJECT_NAME FROM DUAL UNION
SELECT 'ABC_SCssB_USER0332_REG_SP' FROM DUAL UNION
SELECT 'SCssB_USER0332_MOD_REG_SP' FROM DUAL UNION
SELECT 'ABC_SCssB_MOD_REG_SP' FROM DUAL
)
Result:
ABC_SCssB_MOD_REG_SP 0
ABC_SCssB_USER0332_MOD_REG_SP 0
ABC_SCssB_USER0332_REG_SP 0
SCssB_USER0332_MOD_REG_SP 0
Expected Result:
ABC_SCssB_MOD_REG_SP 0
ABC_SCssB_USER0332_MOD_REG_SP 1
ABC_SCssB_USER0332_REG_SP 0
SCssB_USER0332_MOD_REG_SP 0
How can I achieve that in oracle ?
If regular expressions are not mandated you could do this, assuming you need 1 or more digits after '_USER':
select
object_name,
case when translate(OBJECT_NAME, '#0123456789', ' ##########')
like 'ABC\_%\_USER#%\_MOD\_%' escape '\'
then 1
else 0
end as is_match
from
(
select 'ABC_SCssB_USER0332_MOD_REG_SP' object_name from dual union
select 'ABC_SCssB_USER0332_REG_SP' from dual union
select 'SCssB_USER0332_MOD_REG_SP' from dual union
select 'ABC_SCssB_MOD_REG_SP' from dual
);
This runs a bit quicker than the regexp version for me (on 12.1.0.1.0) - about 75% of the time taken by the regexp version.
If there can be 0 or more digits after '_USER' then this will do:
select
object_name,
case when OBJECT_NAME like 'ABC\_%\_USER%\_MOD\_%' escape '\'
then 1
else 0
end as is_match
from
(
select 'ABC_SCssB_USER0332_MOD_REG_SP' object_name from dual union
select 'ABC_SCssB_USER0332_REG_SP' from dual union
select 'SCssB_USER0332_MOD_REG_SP' from dual union
select 'ABC_SCssB_MOD_REG_SP' from dual
);
Ok, so it turns out it will work if you change \w* to .*. It's still not clear what causes \w to fail, though.
I have once encountered non-latin ranges in character classes (like [A-z] but for Cyrillic, [А-я]) not working properly because of NLS_SORT settings. perhaps something similar is affecting \w?
#simsim, please post your exact database version and NLS settings, so that we could try to get to the root of the problem and make this question more useful to others.
EDIT:
The reason turns out to be much simpler - database version 10.1 is the culprit, regexp support was just added in 10g and \w is simply not supported in this version. My instance is 10.2, and "perl-influenced extensions" were only added in 10.2 - see this table for a full list of things that were added, and this link to see what's available in 10.1. Be aware that you also don't have support for non-greedy quantifiers (.*?, .+?) or similar character classes like \d.