How to define regexp for text in Postgres - regex

Please help to define Postgres regexp for this case:
I have string field:
union all select 'AbC-345776-2345' /*comment*/ union all select 'Fgr-sdf344-111a' /*BN34*/ some text union all select 'sss-sdf34-123' /*some text*/ some text
Here is the same text in select statement for convinience:
select 'union all select ''AbC-345776-2345'' /*comment*/ union all select ''Fgr-sdf344-111a'' /*BN34*/ some text union all select ''sss-sdf34-123'' /*some text*/ some text' as str
I need to get from this mess text only values in '...' and select it into separated rows like this:
AbC-345776-2345
Fgr-sdf344-111a
sss-sdf34-123
Pattern: 'first 2-3 letters - several letters and numbers - several letters and numbers'
I created this select but it contains all comments and "sometext" as well:
select regexp_split_to_table(trim(replace(replace(replace(replace(t1.str,'union all select',''),'from DUAL',''),chr(10),''),'''','') ), E'\\s+')
from (select 'union all select ''AbC-345776-2345'' /*comment*/ union all select ''Fgr-sdf344-111a'' /*BN34*/ some text union all select ''sss-sdf34-123'' /*some text*/ some text' as str) t1;

The following should do it:
select (regexp_matches(str, $$'([a-zA-Z]{2,3}-[a-zA-Z0-9]+-[a-zA-Z0-9]+)'$$, 'g'))[1]
from the_table;
Given your sample data it returns:
regexp_matches
---------------
AbC-345776-2345
Fgr-sdf344-111a
sss-sdf34-123
The regex checks for the pattern you specified inside single quotes. By using a group (...) I excluded the single quotes from the result.
regexp_matches() returns one row for each match, containing an array of matches. But as the regex only contains a single group, the first element of the array is what we are interested in.
I used dollar quoting to avoid escaping the single quotes in the regex
Online example

Related

BigQuery regexp replace character between quotes

I'm trying to use the BigQuery function regexp_replace for the following scenario:
Given a string field with comma as a delimiter, I need to only remove the commas within double quotes.
I found the following regex to work in the website but it seems that the BigQuery function doesn't support Lookahead groups. Could you please help me find an equivalent expression that is supported by the Big Query function regexp_replace?
https://regex101.com/r/nxkqtb/3
Big Query example code not supported:
WITH tbl AS (
SELECT 'LINE_NR="1",TXT_FIELD="Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="2",TXT_FIELD=",,Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="3",TXT_FIELD="Some text ,",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="4",TXT_FIELD=",Some ,text,",CID="0"' as text
)
SELECT
REGEXP_REPLACE(text, r'(?m),(?=[^"]*"(?:[^"\r\n]*"[^"]*")*[^"\r\n]*$)', "")
FROM tbl;
Thank you
Consider below approach (assuming you know in advance keys within the text field)
select text,
( select string_agg(replace(kv, ',', ''), ',' order by offset)
from unnest(regexp_extract_all(text, r'((?:LINE_NR|TXT_FIELD|CID)=".*?")')) kv with offset
) corrected_text
from tbl;
if applied to sample data in your question - output is

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.
As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;
For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

regex to find alphanumeric combination of number+text only no special

have to find fix pattern of length 4 alphanumeric in input string
i have tried numeric only and alnum but cant figure out how i would only limit to char+num and no other special character or Numeric by itself
WITH tab AS (
SELECT '''1234,4565,1212,7658''' AS str FROM dual UNION ALL
SELECT '''abce,dddd,jdjd,rdrd,dder''' AS str FROM dual UNION ALL
SELECT '''123m,d565,1dd2,7fur' AS str FROM dual UNION ALL
SELECT '''1m#4,4u#5,1212,abcd' AS str FROM dual UNION ALL
SELECT '''abcd,456a,d212,7658''' AS str FROM dual UNION ALL
SELECT '''1234,4565,1212'',7658''' AS str FROM dual
)
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^['']([[:alnum:]]{4},)+([[:alnum:]]{4})['']$')
AND NOT REGEXP_LIKE(t.str ,'^['']([[:digit:]]{4},)+([[:digit:]]{4})['']$')
Expected
abce,dddd,jdjd,rdrd,dder
123m,d565,1dd2,7fur
Not expected
1m#4,4u#5,1212,abcd' --since this one has only 'abcd' valid but not others
abcd,456a,d212,7658 --since this one has '7658' which is invalid but others are
1234,4565,1212 --all numeric should be ignored
A regular expression similar to this will capture what you have outlined in words:
^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$
SELECT * FROM tab WHERE REGEXP_LIKE(str, '^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$', 'i');
However I can't work out your use of single quotes in your example, so you'll need to modify this to handle your quotes.
I would recommend updating your question to be more clear about quotes.
Also note I'm not explicitly familiar with PLSQL - written with MySQL in mind.
All you need in the second REGEXP is ignore rows that have characters that are not alphanumeric (except comma) and number groups with a size equivalent to 4. This is necesary because Oracle does not support positive lookahead according to this web site.
The solution that I propose is...
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^(([[:alnum:]]{4}),)*([[:alnum:]]{4})$')
AND NOT REGEXP_LIKE(t.str ,'[^[:alnum:],]|[0-9]{4}');

Find a string with or without space in oracle using like or regex

I have a string which contains specific 'winner code' which needs to be matched exactly but in the database some records contains spaces and extra characters within 'winners code' and if I use 'like operator' it only returns the matching criteria. I want to use one simplified query which can return all the records if it contains the winner code.Please find below my query and details
Winner code - أ4 ب3 ج10
Records with spaces - أ4 ب 3 ج 10
Records with extra character - (أ(4)
ب(3)
ج(10
My Query -
SELECT COLUMN_NAME,
FROM TABLE_NAME
WHERE
((COLUMN_NAME LIKE '%أ4%ب3%ج10%') or(COLUMN_NAME LIKE '%أ 4%ب 3%ج 10%'))
The above query returns with and without space data as its matching the criteria.
Thanks
If I correctly understand your need, you may try :
with test(str) as (
select '10X3Y4Z' from dual union all
select '10 X 3 Y 4 Z' from dual union all
select '(10)X(3)Y(4)Z' from dual union all
select '10#X3Y4 Z' from dual union all
select '10 # X3Y4Z' from dual )
select str
from test
where regexp_instr(str, '10[ |\)]{0,1}X[ |\(]{0,1}3[ |\)]{0,1}Y[ |\(]{0,1}4[ |\)]{0,1}Z') != 0
This matches your "winner code" ( I used different characters to simplify my test) even if the numbers are surrounded by '()' or a single space.
This can be re-written in a more compact way, but I believe this form is clear enough; it uses regular expressions like [ |\)]{0,1} to match a space or a parenthesis, with zero or one occurrence.

Oracle regex to find the special character in name field

I'm trying to filter out the names which have special characters.
Requirement:
1) Filter the names which have characters other than a-zA-Z , space and forward slash(/).
Regex being tried out:
1) regexp_like (customername,'[^a-zA-Z[:space:]\/]'))
2) regexp_like (customername,'[^a-zA-Z \/]'))
The above two regex helps in finding the names with special characters like ? and dot(.)
For example:
LEAL/JO?O
FRANCO/DIVALDO Sr.
But I couldn't figure out why some names(listed below) with the allowed characters(a-zA-Z , space and forward slash(/)) also get retrieved.
For example:
ESTEVES/MARIA INES
PEREZ/JOSE
DUTRA SILVA/LIGIA
Please help to figure out the mistake in the regex being used.
Many thanks in advance!
Your regex #1 worked for me on 11g with the name data copied/pasted from this page. I wonder if you have non-printable control characters in the data? Try adding [:cntrl:] to the regex to catch control characters. P.S. the backslash is not needed before the slash when inside of a character class (square brackets).
SQL> with tbl(name) as (
select 'LEAL/JO?O' from dual union
select 'FRANCO/DIVALDO Sr.' from dual union
select 'ESTEVES/MARIA INES' from dual union
select 'PEREZ/JOSE' from dual union
select 'DUTRA SILVA/LIGIA' from dual
)
select *
from tbl
where regexp_like(name, '[^a-zA-Z[:space:][:cntrl:]/]');
NAME
------------------
FRANCO/DIVALDO Sr.
LEAL/JO?O
SQL>
If you can copy/paste this, run it and get the same results, then something is up with the data in your table. Have a look at the data in HEX which will bring to light a previously hidden character perhaps. Here's a simple example which shows the name "JOSE" in HEX. Using one of the numerous ASCII charts out there like http://www.asciitable.com/ you can see there are no hidden characters:
SQL> select 'JOSE' as chr, rawtohex('JOSE') as hex from dual;
CHR HEX
---- --------
JOSE 4A4F5345
SQL>
So, have a look at a name or two and see if you have any hidden characters. If not, I suspect a conflicting characterset issue maybe.
#gary_w has most of the bases well covered....
Here's my sql version of unix: cat -vet MyFile
select replace(regexp_replace(my_column,'[^[:print:]]', '!ACK!'),' ','.') as CAT_VET
from my_table
... all the non-printing characters become !ACK! and spaces become . You still need to determine what the characters actually ARE, but it's useful to find the looney-toon characters in your data.
Also, select dump(my_column) ... is another way to view the raw column values.