Simple regular expression in Oracle - regex

I have a table with the following values:
ID NAME ADDRESS
1 Bob Super stree1 here goes
2 Alice stree100 here goes
3 Clark Fast left stree1005
4 Magie Right stree1580 here goes
I need to make a query using LIKE and get only the row having stree1 (in this case only get the one with ID=1) and I use the following query:
select * from table t1 WHERE t1.ADDRESS LIKE '%stree1%';
But the problem is that I get all rows as each of them contains stree1 plus some char/number after.
I have found out that I can use REGEXP_LIKE as I am using oracle, what would be the proper regex to use in:
select * from table t1 WHERE regexp_like(t1.ADDRESS ,'stree1');

I would think that this would be the reg-ex you are seeking:
select * from table t1 WHERE regexp_like(t1.ADDRESS ,'stree1(?:[^[:word:]]|$)');
If you want to, you can further simplify this to:
select * from table t1 WHERE regexp_like(t1.ADDRESS ,'stree1(?:\W|$)');
That is, 'stree1' is not followed by a word character (i.e., is followed by space/punctuation/etc...) or 'stree1' appears at the end of the string. Of course there are many other ways to do the same thing, including word boundaries 'stree1\b', expecting particular characters after the 1 in stree1 (e.g., a white-space with 'stree1\s'), etc...

This may help:
stree1\b

The first '\W' is tells it it a non-word character since you need noting after 'stree1' but space
and '$' tells take it as a valid string if it ends with stree1
select *
from table1
where regexp_like(address,'stree1(\W|$)')

Related

Bigquery SQL Regex - Either start/end of string or not followed by/following any alphabet

I want to find if a string (already lowercase) contains an exact word. It can be anywhere within the string. For example, let's say the word is pot.
I initially used
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$")
But this is unable to catch cases where pot comes at the start/end of the string. In my understanding [^a-z] needs to match something other than alphabets and for start/end cases it is not able to find anything.
So, I added * to make sure that even if there is no alphabet it is ok.
regexp_contains(lower(string), "^.*[^a-z]*pot[^a-z]*.*$")
But then it match cases where pot is a part of another larger word for eg. honeypot etc.
I don't think this problem is restricted to Bigquery SQL's regexp_contains.
Consider below example
#standardSQL
with `project.dataset.table` as (
select 'pot asdf' sentence union all
select 'rtui pot' union all
select 'rtui pot dfgrert' union all
select 'sdpot potdf lkpotij' union all
select 'fjkhgsiejur sldkkr'
)
select sentence
from `project.dataset.table`
where regexp_contains(lower(sentence), r'\bpot\b')
regexp_contains(lower(string), "^.*[^a-z]pot[^a-z].*$|^pot[^a-z].*$|^.*[^a-z]pot$|^pot$")

Vertica REGEXP_SUBSTR use /g flag

I am trying to extract all occurrences of a word before '=' in a string, i tried to use this regex '/\w+(?=\=)/g' but it returns null, when i remove the first '/' and the last '/g' it returns only one occurrence that's why i need the global flag, any suggestions?
As Wiktor pointed out, by default, you only get the first string in a REGEXP_SUBSTR() call. But you can get the second, third, fourth, etc.
Embedded into SQL, you need to treat regular expressions differently from the way you would treat them in perl, for example. The pattern is just the pattern, modifiers go elsewhere, you can't use $n to get the n-th captured sub-expression, and you need to proceed in a specific way to get the n-th match of a pattern, etc.
The trick is to CROSS JOIN your queried table with an in-line created index table, consisting of as many consecutive integers as you expect occurrences of your pattern - and a few more for safety. And Vertica's REGEXP_SUBSTR() call allows for additional parameters to do that. See this example:
WITH
-- one exemplary input row; concatenating substrings for
-- readability
input(s) AS (
SELECT 'DRIVER={Vertica};COLUMNSASCHAR=1;CONNECTIONLOADBALANCE=True;'
||'CONNSETTINGS=set+search_path+to+public;DATABASE=sbx;'
||'LABEL=dbman;PORT=5433;PWD=;SERVERNAME=127.0.0.1;UID=dbadmin;'
)
,
-- an index table to CROSS JOIN with ... maybe you need more integers ...
loop_idx(i) AS (
SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
)
,
-- the query containing the REGEXP_SUBSTR() call
find_token AS (
SELECT
i -- the index from the in-line index table, needed
-- for ordering the outermost SELECT
, REGEXP_SUBSTR (
s -- the input string
, '(\w+)=' -- the pattern - a word followed by an equal sign; capture the word
, 1 -- start from pos 1
, i -- the i-th occurrence of the match
, '' -- no modifiers to regexp
, 1 -- the first and only sub-pattern captured
) AS token
FROM input CROSS JOIN loop_idx -- the CROSS JOIN with the in-line index table
)
-- the outermost query filtering the non-matches - the empty strings - away...
SELECT
token
FROM find_token
WHERE token <> ''
ORDER BY i
;
The result will be one row per found pattern:
token
DRIVER
COLUMNSASCHAR
CONNECTIONLOADBALANCE
CONNSETTINGS
DATABASE
LABEL
PORT
PWD
SERVERNAME
UID
You can do all sorts of things in modern SQL - but you need to stick to the SQL and to the relational paradigm - that's all ...
Happy playing ...
Marco

Find a string with or without space in oracle using like or regex

I have a string which contains specific 'winner code' which needs to be matched exactly but in the database some records contains spaces and extra characters within 'winners code' and if I use 'like operator' it only returns the matching criteria. I want to use one simplified query which can return all the records if it contains the winner code.Please find below my query and details
Winner code - أ4 ب3 ج10
Records with spaces - أ4 ب 3 ج 10
Records with extra character - (أ(4)
ب(3)
ج(10
My Query -
SELECT COLUMN_NAME,
FROM TABLE_NAME
WHERE
((COLUMN_NAME LIKE '%أ4%ب3%ج10%') or(COLUMN_NAME LIKE '%أ 4%ب 3%ج 10%'))
The above query returns with and without space data as its matching the criteria.
Thanks
If I correctly understand your need, you may try :
with test(str) as (
select '10X3Y4Z' from dual union all
select '10 X 3 Y 4 Z' from dual union all
select '(10)X(3)Y(4)Z' from dual union all
select '10#X3Y4 Z' from dual union all
select '10 # X3Y4Z' from dual )
select str
from test
where regexp_instr(str, '10[ |\)]{0,1}X[ |\(]{0,1}3[ |\)]{0,1}Y[ |\(]{0,1}4[ |\)]{0,1}Z') != 0
This matches your "winner code" ( I used different characters to simplify my test) even if the numbers are surrounded by '()' or a single space.
This can be re-written in a more compact way, but I believe this form is clear enough; it uses regular expressions like [ |\)]{0,1} to match a space or a parenthesis, with zero or one occurrence.

Remove substrings that vary in value in Oracle

I have a column in Oracle which can contain up to 5 separate values, each separated by a '|'. Any of the values can be present or missing. Here are come examples of how the data might look:
100-1
10-3|25-1|120/240
15-1|15-3|15-2|120/208
15-1|15-3|15-2|120/208|STA-2
112-123|120/208|STA-3
The values are arbitrary except for the order. The numerical values separated by dashes always come first. There can be 1 to 3 of these values present. The numerical values separated by a slash (if it is present) is next. The string, 'STA', and a numerical value separated by a dash is always last, if it is present.
What I would like to do is reformat this column to only ever include the first three possible values, those being the three numerical values separated by dashes. Afterwards, I want to replace 2nd numeric in each value (the numeric after the dash) using the following pattern:
1 = A
2 = B
3 = C
I would also like to remove the dash afterwards, but not the '|' that separates the values unless there is a trailing '|'.
To give you an idea, here's how the values at the beginning of the post would look after the reformatting:
100A
10C|25A
15A|15C|15B
15A|15C|15B
112ABC
I'm thinking this can be done with regex expressions but it's got me a little confused. Does anyone have a solution?
If I have to solve this problem I will solve it in following ways.
SELECT
REGEXP_REPLACE(column,'\|\d+\/\d+(\|STA-\d+)?',''),
REGEXP_REPLACE(column,'(\d+)-(1)([^\d])','\1A\3'),
REGEXP_REPLACE(column,'(\d+)-(2)([^\d])','\1B\3'),
REGEXP_REPLACE(column,'(\d+)-(3)([^\d])','\1C\3'),
REGEXP_REPLACE(column,'(\d+)-(123)([^\d])','\1ABC')
FROM table;
Explanation: Let us break down each REGEXP_REPLACE statement one by one.
REGEXP_REPLACE(column,'\|\d+\/\d+(\|STA-\d+)?','')
This will replace the end part like 120/208|STA-2 with empty string so that further processing is easy.
Finding match was easy but replacing A for 1, B for 2 and C for 3 was not possible ( as per my knowledge ) So I did those matching and replacements separately.
In each regex from second statement (\d+)-(yourNumber)([^\d]) first group is number before - then yourNumber is either 1,2,3 or 123 followed by |.
So the replacement will be according to yourNumber.
All demos here from version 1 to 5.
Note:- I have just done replacement for combination of yourNUmber for those present in question. You can do likewise for other combinations too.
you can do this in one line, but you can write simple function to do that
SELECT str, REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?','') cut
, REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4') rep3toC
, REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4'), '(\-)([1,C]*)(2)([1,C]*)', '\1\2B\4') rep2toB
, REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4'), '(\-)([1,C]*)(2)([1,C]*)', '\1\2B\4'), '(\-)([B,C]*)(1)([B,C]*)', '\1\2A\4') rep1toA
, REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4'), '(\-)([1,C]*)(2)([1,C]*)', '\1\2B\4'), '(\-)([B,C]*)(1)([B,C]*)', '\1\2A\4'), '-', '') "rep-"
FROM (
SELECT '100-1' str FROM dual UNION
SELECT '10-3|25-1|120/240' str FROM dual UNION
SELECT '15-1|15-3|15-2|120/208' str FROM dual UNION
SELECT '15-1|15-3|15-2|120/208|STA-2' str FROM dual UNION
SELECT '112-123|120/208|STA-3' FROM dual
) tab

Oracle regex to find the special character in name field

I'm trying to filter out the names which have special characters.
Requirement:
1) Filter the names which have characters other than a-zA-Z , space and forward slash(/).
Regex being tried out:
1) regexp_like (customername,'[^a-zA-Z[:space:]\/]'))
2) regexp_like (customername,'[^a-zA-Z \/]'))
The above two regex helps in finding the names with special characters like ? and dot(.)
For example:
LEAL/JO?O
FRANCO/DIVALDO Sr.
But I couldn't figure out why some names(listed below) with the allowed characters(a-zA-Z , space and forward slash(/)) also get retrieved.
For example:
ESTEVES/MARIA INES
PEREZ/JOSE
DUTRA SILVA/LIGIA
Please help to figure out the mistake in the regex being used.
Many thanks in advance!
Your regex #1 worked for me on 11g with the name data copied/pasted from this page. I wonder if you have non-printable control characters in the data? Try adding [:cntrl:] to the regex to catch control characters. P.S. the backslash is not needed before the slash when inside of a character class (square brackets).
SQL> with tbl(name) as (
select 'LEAL/JO?O' from dual union
select 'FRANCO/DIVALDO Sr.' from dual union
select 'ESTEVES/MARIA INES' from dual union
select 'PEREZ/JOSE' from dual union
select 'DUTRA SILVA/LIGIA' from dual
)
select *
from tbl
where regexp_like(name, '[^a-zA-Z[:space:][:cntrl:]/]');
NAME
------------------
FRANCO/DIVALDO Sr.
LEAL/JO?O
SQL>
If you can copy/paste this, run it and get the same results, then something is up with the data in your table. Have a look at the data in HEX which will bring to light a previously hidden character perhaps. Here's a simple example which shows the name "JOSE" in HEX. Using one of the numerous ASCII charts out there like http://www.asciitable.com/ you can see there are no hidden characters:
SQL> select 'JOSE' as chr, rawtohex('JOSE') as hex from dual;
CHR HEX
---- --------
JOSE 4A4F5345
SQL>
So, have a look at a name or two and see if you have any hidden characters. If not, I suspect a conflicting characterset issue maybe.
#gary_w has most of the bases well covered....
Here's my sql version of unix: cat -vet MyFile
select replace(regexp_replace(my_column,'[^[:print:]]', '!ACK!'),' ','.') as CAT_VET
from my_table
... all the non-printing characters become !ACK! and spaces become . You still need to determine what the characters actually ARE, but it's useful to find the looney-toon characters in your data.
Also, select dump(my_column) ... is another way to view the raw column values.