REGEXP_LIKE to matchtwo words in a sentence in Oracle 11g - regex

sentence :"STANDARD domain WARNING encountered in the PROCESS"
I want to identify all the sentences which have the words STANDARD and WARNING in it using REGEXP_LIKE. Also the search has to be case insensitive.
I would want to replace the following code with REGEXP_LIKE:
Select * from table where upper(sentence) like 'STANDARD%WARNING%'

You can specify the 'i' match parameter for case insensitivity:
Select * from table where REGEXP_LIKE (sentence, '\bstandard(?=\b).*\bwarning\b', 'i')

Regex is powerful but not so good with performance.
Try this -
Select * from table
where upper(sentence) like '%STANDARD%' or
upper(sentence) like '%WARNING%'
Easy to read and serve the purpose.

Related

Bigquery SQL Regex - Either start/end of string or not followed by/following any alphabet

I want to find if a string (already lowercase) contains an exact word. It can be anywhere within the string. For example, let's say the word is pot.
I initially used
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$")
But this is unable to catch cases where pot comes at the start/end of the string. In my understanding [^a-z] needs to match something other than alphabets and for start/end cases it is not able to find anything.
So, I added * to make sure that even if there is no alphabet it is ok.
regexp_contains(lower(string), "^.*[^a-z]*pot[^a-z]*.*$")
But then it match cases where pot is a part of another larger word for eg. honeypot etc.
I don't think this problem is restricted to Bigquery SQL's regexp_contains.
Consider below example
#standardSQL
with `project.dataset.table` as (
select 'pot asdf' sentence union all
select 'rtui pot' union all
select 'rtui pot dfgrert' union all
select 'sdpot potdf lkpotij' union all
select 'fjkhgsiejur sldkkr'
)
select sentence
from `project.dataset.table`
where regexp_contains(lower(sentence), r'\bpot\b')
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$|^pot[^a-z].*$|^.*[^a-z]pot$|^pot$")

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.
As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;
For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

Oracle regex to find the special character in name field

I'm trying to filter out the names which have special characters.
Requirement:
1) Filter the names which have characters other than a-zA-Z , space and forward slash(/).
Regex being tried out:
1) regexp_like (customername,'[^a-zA-Z[:space:]\/]'))
2) regexp_like (customername,'[^a-zA-Z \/]'))
The above two regex helps in finding the names with special characters like ? and dot(.)
For example:
LEAL/JO?O
FRANCO/DIVALDO Sr.
But I couldn't figure out why some names(listed below) with the allowed characters(a-zA-Z , space and forward slash(/)) also get retrieved.
For example:
ESTEVES/MARIA INES
PEREZ/JOSE
DUTRA SILVA/LIGIA
Please help to figure out the mistake in the regex being used.
Many thanks in advance!
Your regex #1 worked for me on 11g with the name data copied/pasted from this page. I wonder if you have non-printable control characters in the data? Try adding [:cntrl:] to the regex to catch control characters. P.S. the backslash is not needed before the slash when inside of a character class (square brackets).
SQL> with tbl(name) as (
select 'LEAL/JO?O' from dual union
select 'FRANCO/DIVALDO Sr.' from dual union
select 'ESTEVES/MARIA INES' from dual union
select 'PEREZ/JOSE' from dual union
select 'DUTRA SILVA/LIGIA' from dual
)
select *
from tbl
where regexp_like(name, '[^a-zA-Z[:space:][:cntrl:]/]');
NAME
------------------
FRANCO/DIVALDO Sr.
LEAL/JO?O
SQL>
If you can copy/paste this, run it and get the same results, then something is up with the data in your table. Have a look at the data in HEX which will bring to light a previously hidden character perhaps. Here's a simple example which shows the name "JOSE" in HEX. Using one of the numerous ASCII charts out there like http://www.asciitable.com/ you can see there are no hidden characters:
SQL> select 'JOSE' as chr, rawtohex('JOSE') as hex from dual;
CHR HEX
---- --------
JOSE 4A4F5345
SQL>
So, have a look at a name or two and see if you have any hidden characters. If not, I suspect a conflicting characterset issue maybe.
#gary_w has most of the bases well covered....
Here's my sql version of unix: cat -vet MyFile
select replace(regexp_replace(my_column,'[^[:print:]]', '!ACK!'),' ','.') as CAT_VET
from my_table
... all the non-printing characters become !ACK! and spaces become . You still need to determine what the characters actually ARE, but it's useful to find the looney-toon characters in your data.
Also, select dump(my_column) ... is another way to view the raw column values.

regular expression clob field

I have a question related to an regular expression in oracle 10.
Assuming I have a value like 123456;12345;454545 stored in a clob field, is there a way via an regular expression to only filter on the second pattern (12345) knowing that the value can be more then 5 digits but always occurs after the first semicolon and always has a trailing semicolon at the end?
Thanks a lot for your support in that matter,
Have a nice day,
This query should give you your desired output.
SELECT REGEXP_REPLACE(REGEXP_SUBSTR('123456;12345;454545;45634',';[0-9]+;'),';')
FROM dual;
You can get filter any pattern using this query just change 2 to any value, but it should be less than or equal to the number of elements in the string
with tab(value) as
(select '123456;12345;454545' from dual)
select regexp_substr(value, '[^;]+', 1, 2) from tab;
easily by one call:
select regexp_replace('123456;12345;454545','^[0-9]+;([0-9]+);.*$','\1')
from dual;
perhaps, regexp expression can be modified in a way of more good-looking or your business logic, but the idea, I think, is clear.
select regexp_replace(regexp_substr(Col_name,';\d+;'),';','') from your_table;

Regex CHECK constraint not working with SQL server

im trying to reject all inputs not in the format "03 xxxx xxxx" so i created a table like
create table records
(
....
num varchar(255) NOT NULL,
...
CONSTRAINT num_check CHECK (num like '03 [0-9]{4} [0-9]{4}')
)
which should (i think?) accept for example "03 1234 1234". but if i try to add this via sql manager i get an error with the message:
"the INSERT statement conflicted with the CHECK constraint "num_check" "
at first i thought my Regex was off but ive tried it in a few other places and it accepts the example above.
any ideas?
like does not work with regular expressions, it has its own, much simpler wildcard patterns, which only support %, _ , [a-z], and [^a-z]. That's it. {4} would not works, just like most regex features.
You should be able to use:
like '03 [0-9][0-9][0-9][0-9] [0-9][0-9][0-9][0-9]'
Another option, a little less repetitive:
declare #digitChar nvarchar(12)
set #digitChar = '[0-9]'
Where clause:
like '03 ' + replicate(#digitChar,4) + ' ' + replicate(#digitChar,4)
Example: http://sqlfiddle.com/#!3/d41d8/3251