Regex to remove <> from column - regex

I have a column with name and email id like
Column A
ABX <ABX#gmail.com>
hfgfg <shantanu #gmail.com>
I Want to use a regex to only retrieve the name and exclude the <> along with email idfrom the above column.
Regex in SQL query.
I tried
SELECT REPLACE('s <abc#gmail.com>', SUBSTR('s <abc#gmail.com>', instr('(', 's <abc#gmail.com>'), LENGTH('s <abc#gmail.com>') - instr(')', reverse('s <abc#gmail.com>')) - instr('(', 's <abc#gmail.com>') + 2), '')
FROM dual;

You could use regular expressions; either remove everything from first opening angle bracket, optionally trimming any remaining spaces as well:
select rtrim(regexp_substr('s <abc#gmail.com>', '[^<]*'), ' ') as name from dual;
Or replace the angle brackets and whatever is inside them, and any immediately preceding whitespace, with null:
select regexp_replace('s <abc#gmail.com>', '\s?<.*>', null) as name from dual;
With some sample data:
with your_table(column_a) as (
select 'Some Name <some.name#example.com>' from dual
union all select 'SingleName <single#example.com>' from dual
)
select column_a,
rtrim(regexp_substr(column_a, '[^<]*'), ' ') as name1,
regexp_replace(column_a, '\s?<.*>', null) as name2
from your_table;
COLUMN_A NAME1 NAME2
--------------------------------- --------------- ---------------
Some Name <some.name#example.com> Some Name Some Name
SingleName <single#example.com> SingleName SingleName
If you want the email address as well you could use:
select regexp_substr('s <abc#gmail.com>', '([^<>]*)', 1, 3) as email from dual;
... though there might be a better way. Demoing that too:
with your_table(column_a) as (
select 'Some Name <some.name#example.com>' from dual
union all select 'SingleName <single#example.com>' from dual
)
select column_a,
rtrim(regexp_substr(column_a, '[^<]*'), ' ') as name1,
regexp_replace(column_a, '\s?<.*>', null) as name2,
regexp_substr(column_a, '([^<>]*)', 1, 3) as email
from your_table;
COLUMN_A NAME1 NAME2 EMAIL
--------------------------------- ---------- ---------- ---------------------
Some Name <some.name#example.com> Some Name Some Name some.name#example.com
SingleName <single#example.com> SingleName SingleName single#example.com

Why don't you try something like this :
UPDATE table SET A=TRIM(SUBSTRING(A, 1, INSTR(A,'<')));

Related

Regular Expression: changing matching method from OR to AND

I have a regular expression like the following: (Running on Oracle's regexp_like(), despite the question isn't Oracle-specific)
abc|bcd|def|xyz
This basically matches a tags field on database to see if tags field contains abc OR bcd OR def OR xyz when user has input for the search query "abc bcd def xyz".
The tags field on the database holds keywords separated by spaces, e.g. "cdefg abcd xyz"
On Oracle, this would be something like:
select ... from ... where
regexp_like(tags, 'abc|bcd|def|xyz');
It works fine as it is, but I want to add an extra option for users to search for results that match all keywords. How should I change the regular expression so that it matches abc AND bcd AND def AND xyz ?
Note: Because I won't know what exact keywords the user will enter, I can't pre-structure the query in the PL/SQL like this:
select ... from ... where
tags like '%abc%' AND
tags like '%bcd%' AND
tags like '%def%' AND
tags like '%xyz%';
You can split the input pattern and check that all the parts of the pattern match:
SELECT t.*
FROM table_name t
CROSS APPLY(
WITH input (match) AS (
SELECT 'abc bcd def xyz' FROM DUAL
)
SELECT 1
FROM input
CONNECT BY LEVEL <= REGEXP_COUNT(match, '\S+')
HAVING COUNT(
REGEXP_SUBSTR(
t.tags,
REGEXP_SUBSTR(match, '\S+', 1, LEVEL)
)
) = REGEXP_COUNT(match, '\S+')
)
Or, if you have Java enabled in the database then you can create a Java function to match regular expressions:
CREATE AND COMPILE JAVA SOURCE NAMED RegexParser AS
import java.util.regex.Pattern;
public class RegexpMatch {
public static int match(
final String value,
final String regex
){
final Pattern pattern = Pattern.compile(regex);
return pattern.matcher(value).matches() ? 1 : 0;
}
}
/
Then wrap it in an SQL function:
CREATE FUNCTION regexp_java_match(value IN VARCHAR2, regex IN VARCHAR2) RETURN NUMBER
AS LANGUAGE JAVA NAME 'RegexpMatch.match( java.lang.String, java.lang.String ) return int';
/
Then use it in SQL:
SELECT *
FROM table_name
WHERE regexp_java_match(tags, '(?=.*abc)(?=.*bcd)(?=.*def)(?=.*xyz)') = 1;
Try this, the idea being counting that the number of matches is == to the number of patterns:
with data(val) AS (
select 'cdefg abcd xyz' from dual union all
select 'cba lmnop xyz' from dual
),
targets(s) as (
select regexp_substr('abc bcd def xyz', '[^ ]+', 1, LEVEL) from dual
connect by regexp_substr('abc bcd def xyz', '[^ ]+', 1, LEVEL) is not null
)
select val from data d
join targets t on
regexp_like(val,s)
group by val having(count(*) = (select count(*) from targets))
;
Result:
cdefg abcd xyz
I think dynamic SQL will be needed for this. The match all option will require individual matching with logic to ensure every individual match is found.
An easy way would be to build a join condition for each keyword. Concatenate the join statements in a string. Use dynamic SQL to execute the string as a query.
The example below uses the customer table from the sample schemas provided by Oracle.
DECLARE
-- match string should be just the values to match with spaces in between
p_match_string VARCHAR2(200) := 'abc bcd def xyz';
-- need logic to determine match one (OR) versus match all (AND)
p_match_type VARCHAR2(3) := 'OR';
l_sql_statement VARCHAR2(4000);
-- create type if bulk collect is needed
TYPE t_email_address_tab IS TABLE OF customers.EMAIL_ADDRESS%TYPE INDEX BY PLS_INTEGER;
l_email_address_tab t_email_address_tab;
BEGIN
WITH sql_clauses(row_idx,sql_text) AS
(SELECT 0 row_idx -- build select plus beginning of where clause
,'SELECT email_address '
|| 'FROM customers '
|| 'WHERE 1 = '
|| DECODE(p_match_type, 'AND', '1', '0') sql_text
FROM DUAL
UNION
SELECT LEVEL row_idx -- build joins for each keyword
,DECODE(p_match_type, 'AND', ' AND ', ' OR ')
|| 'email_address'
|| ' LIKE ''%'
|| REGEXP_SUBSTR( p_match_string,'[^ ]+',1,level)
|| '%''' sql_text
FROM DUAL
CONNECT BY LEVEL <= LENGTH(p_match_string) - LENGTH(REPLACE( p_match_string, ' ' )) + 1
)
-- put it all together by row_idx
SELECT LISTAGG(sql_text, '') WITHIN GROUP (ORDER BY row_idx)
INTO l_sql_statement
FROM sql_clauses;
dbms_output.put_line(l_sql_statement);
-- can use execute immediate (or ref cursor) for dynamic sql
EXECUTE IMMEDIATE l_sql_statement
BULK COLLECT
INTO l_email_address_tab;
END;
Variable
Value
p_match_string
abc bcd def xyz
p_match_type
AND
l_sql_statement
SELECT email_address FROM customers WHERE 1 = 1 AND email_address LIKE '%abc%' AND email_address LIKE '%bcd%' AND email_address LIKE '%def%' AND email_address LIKE '%xyz%'
Variable
Value
p_match_string
abc bcd def xyz
p_match_type
OR
l_sql_statement
SELECT email_address FROM customers WHERE 1 = 0 OR email_address LIKE '%abc%' OR email_address LIKE '%bcd%' OR email_address LIKE '%def%' OR email_address LIKE '%xyz%'

duckdb - aggregate string with a given separator

The standard aggregator makes coma separated list:
$ SELECT list_string_agg([1, 2, 'sdsd'])
'1,2,sdsd'
How can I make a smicolumn separated list or '/'-separated? Like '1;2;sdsd' or '1/2/sdsd'.
I believe string_agg function is what you want which also supports "distinct".
# Python example
import duckdb as dd
CURR_QUERY = \
'''
SELECT string_agg(distinct a.c, ' || ') AS str_con
FROM (SELECT 'string 1' AS c
UNION ALL
SELECT 'string 2' AS c,
UNION ALL
SELECT 'string 1' AS c) AS a
'''
print(dd.query(CURR_QUERY))
Above will give you "string 1||string 2"

no preceding characters in regexp statement

So I have attempted to use a negative look back in a regexp statement and have looked online at other solutions but they don't seem to work for me so obviously I am doing something wrong-
I am looking for a return on the first line but the others should be null. Essentially I need CT CHEST or CT LUNG
Any assistance TIA
with test (id, description) as (
select 1, 'CT CHEST HIGH RESOLUTION, NO CONTRAST' from dual union all --want this
select 2, 'INJECTION, THORACIC TRANSFORAMEN EPIDURAL, NON NEUROLYTIC W IMAGE GUIDANCE.' from dual union all --do not want this
select 3, 'The cow came back. But the dog went for a walk' from dual) --do not want this
select id, description, regexp_substr(description, '(?<![a-z]ct).{1,20}(CHEST|THOR|LUNG)',1,1,'i') from test;
regexp_substr(description,'([^A-Z]|^)[CT].{1,20}(CHEST|THOR|LUNG)',1,1,'i')
works
Leverage Oracle Subexrpession Parameter to Check for CT
I would leverage the use of subexpressions to use a pattern like this:
'regexp_substr(description, '(^| )((ct ).*((CHEST)|(THOR)|(LUNG)))', 1, 1,'i', 2)`
-subexpression 1 to look for beginning of line or a space: (^| )
-subexpression 3 to look for 'CT': (ct )
-allow for other characters: .*
-subexressions 5,6,7: (CHEST)|(THOR)|(LUNG)
-subexpression 2 which contain subexpression 3 an subexprssion 4
I use the last optional parameter to identify that I want subexpression 2.
WITH test (id, description) as (
SELECT 1
, 'CT CHEST HIGH RESOLUTION , NO CONTRAST'
FROM dual
UNION ALL --want this
SELECT 2
, 'INJECTION , THORACIC TRANSFORAMEN EPIDURAL , NON NEUROLYTIC W IMAGE GUIDANCE.'
FROM dual
UNION ALL --do not want this
SELECT 3
, 'The cow came back. But the dog went FOR a walk'
FROM dual
) --do not want this
SELECT id
, description
, regexp_substr(description, '(^| )((ct ).*((CHEST)|(THOR)|(LUNG)))', 1, 1,'i', 2)
FROM test;

Replace brackets and splitting a column into multiple rows based on a delimiter in Postgres

I have a table with column with separated by ';'. The data looks like this:
row_id col
1 p.[D389R;D393_W394delinsRD]
2 p.[D390R;D393_W394delinsRD]
3 p.D389R
4. p.[D370R;D393_W394delinsRD]
I would like replace the '[]' brackets whereever they are and fetch the text. Later, I would like to split the string be ';' and concatenate 'p.' to the splitted text (if it is not there) and create a new row.
The expected output is:
row_id new_col
1 p.D389R
2 p.D393_W394delinsRD
3 p.D390R
4 p.D393_W394delinsRD
5 p.D389R
6 p.D370R
7 p.D393_W394delinsRD
I have tried below query to get the desired output.
SELECT *,
CASE
WHEN regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';') NOT LIKE 'p.[%'
THEN 'p.' || (regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';'))[1]
ELSE regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';')[2]
END AS new_col
FROM table;
Any suggestions would be really helpful.
I would first remove the constant values ( p.[ and ]) from the string and then unnest it.
with clean as (
select row_id, regexp_replace(col, '^p\.(\[){0,1}|\]$', '', 'g') as col
from the_table
)
select row_id, 'p.'|| t.c
from clean c
cross join unnest(string_to_array(c.col, ';')) as t(c)
The CTE (with ...) isn't really necessary, but that way the unnest(...) stays readable.
Online example

Inconsistent results from Oracle's REGEXP_SUBSTR

Given a string of key-value pairs: /* USER='Administrator'; UNV='Universe'; DOC='WebIntellignceReport'; */
My goal is to extract values associated with the USER, UNV, and DOC keys.
Using a pattern of (?<=UNV=')(.*?)(?='), I get the expected value of Universe associated the UNV key (Fiddle).
However, when I use the pattern with REGEXP_SUBSTR, I get a NULL:
SELECT text
,REGEXP_SUBSTR(text,'(?<=UNV='')(.*?)(?='')') UNV
FROM (
SELECT '/* USER=''Administrator''; UNV=''Universe''; DOC=''WebIntellignceReport''; */' as text
FROM dual
) v
What am I missing?
You may extract the contents of group 1:
SELECT text, REGEXP_SUBSTR(text,'UNV=''(.*?)''', 1, 1 ,NULL, 1) UNV
FROM (
SELECT '/* USER=''Administrator''; UNV=''Universe''; DOC=''WebIntellignceReport''; */' as text
FROM dual
) v
See the online demo.
With UNV='(.*?)' , you may extract just what is between the closest single quuotes afterUNV=.
I think the easiest thing to do is just grab the whole key-value pair using REGEXP_SUBSTR, and then do another substr to pull out the value you want.
with v as (select '/* USER=''Administrator''; UNV=''Universe''; DOC=''WebIntellignceReport''; */' as text from dual)
select text, key_val, substr(key_val, instr(key_val, '''')+1, length(key_val)-instr(key_val, '''')-2)
from (
select text,
regexp_substr(text, ' UNV=''[^'']*'';') key_val
from v);
Output:
TEXT KEY_VAL VAL
----------------------------------------------------------------------- ----------------------------------------------------------------------- -----------------------------------------------------------------------
/* USER='Administrator'; UNV='Universe'; DOC='WebIntellignceReport'; */ UNV='Universe'; Universe