Oracle Regex remove duplicates - regex

I have a requirement to remove duplicate values from a comma separated string.
Input String: a,a,a,b,c,a,b
Expected output: a,b,c
What I have tried:
with ct(str) as(
select 'a,a,a,b,c,a,b' from dual
)
select REGEXP_REPLACE(str,'([^,]*)(,\1)+($|,)','\1\3') col from ct
Output: a,b,c,a,b
The above query can remove repetitive characters which are consecutive.
I know that the above requirement can be solved by creating a table out of the comma separated values and do a listagg on the distinct values.
Is it possible to achieve the above requirement using a single regex statement?.

This should give you the required result:
with borken as (SELECT distinct column_value as str,'1' cnt FROM
table(apex_string.split('a,a,a,b,c,a,b' ,',')) )
select listagg(str,',') within group (order by cnt) from borken;

Related

Extract all substrings bounded by the same characters

Given a name_loc column of text like the following:
{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}
I'm trying to extract the names, ideally separated by commas:
Charlie, Wrigley, Ana
I've gotten this far:
SELECT SUBSTRING(CAST(name_loc AS VARCHAR) from '"([^ –]+)')
FROM table;
which returns
Charlie
How can I extend this query to extract all names?
You can do this with a combination of regexp_matches (to extract the names), array_agg (to regroup all matches in a row) and array_to_string (to format the array as you'd like, e.g. with a comma separator):
WITH input(name_loc) AS (
VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
, ('{"Other - somewhere}') -- added this to show multiple rows do not get merged
)
SELECT array_to_string(names, ', ')
FROM input
CROSS JOIN LATERAL (
SELECT array_agg(name)
FROM regexp_matches(name_loc, '"(\w+)', 'g') AS f(name)
) AS f(names);
array_to_string
Charlie, Wrigley, Ana
Other
View on DB Fiddle
My two cents, though I'm rather new to postgreSQL and I had to copy the 1st piece from #Marth's his answer:
WITH input(name_loc) AS (
VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
, ('{"Other - somewhere"}')
)
SELECT REGEXP_REPLACE(name_loc, '{?(,)?"(\w+)[^"]+"}?','\1\2', 'g') FROM input;
regexp_replace
Charlie,Wrigley,Ana
Other
Your string literal happens to be a valid array literal.
(Maybe not by coincidence? And the column should be type text[] to begin with?)
If that's the reliable format, there is a safe and simple solution:
SELECT t.id, x.names
FROM tbl t
CROSS JOIN LATERAL (
SELECT string_agg(split_part(elem, ' – ', 1), ', ') AS names
FROM unnest(t.name_loc::text[]) elem
) x;
Or:
SELECT id, string_agg(split_part(elem, ' – ', 1), ', ') AS names
FROM (SELECT id, unnest(name_loc::text[]) AS elem FROM tbl) t
GROUP BY id;
db<>fiddle here
Steps
Unnest the array with unnest() in a LATERAL CROSS JOIN, or directly in the SELECT list.
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Take the first part with split_part(). I chose ' – ' as delimiter, not just ' ', to allow for names with nested space like "Anne Nicole". See:
Split comma separated column data into additional columns
Aggregate results with string_agg(). I added no particular order as you didn't specify one.
Concatenate multiple result rows of one column into one, group by another column

BigQuery regexp replace character between quotes

I'm trying to use the BigQuery function regexp_replace for the following scenario:
Given a string field with comma as a delimiter, I need to only remove the commas within double quotes.
I found the following regex to work in the website but it seems that the BigQuery function doesn't support Lookahead groups. Could you please help me find an equivalent expression that is supported by the Big Query function regexp_replace?
https://regex101.com/r/nxkqtb/3
Big Query example code not supported:
WITH tbl AS (
SELECT 'LINE_NR="1",TXT_FIELD="Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="2",TXT_FIELD=",,Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="3",TXT_FIELD="Some text ,",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="4",TXT_FIELD=",Some ,text,",CID="0"' as text
)
SELECT
REGEXP_REPLACE(text, r'(?m),(?=[^"]*"(?:[^"\r\n]*"[^"]*")*[^"\r\n]*$)', "")
FROM tbl;
Thank you
Consider below approach (assuming you know in advance keys within the text field)
select text,
( select string_agg(replace(kv, ',', ''), ',' order by offset)
from unnest(regexp_extract_all(text, r'((?:LINE_NR|TXT_FIELD|CID)=".*?")')) kv with offset
) corrected_text
from tbl;
if applied to sample data in your question - output is

RegEx SQL Select not matched

I have this regex
(?i)(sql.*[\s\S]select.*[\s\S]from[\s\S]*?\;)
This one is matched
SQL SELECT Distinct Field1,Field2
FROM Table1
;
But this one is not matched
SQL SELECT Distinct
Field,
Field2
FROM Table1
;
And this one also not:
SQL
SELECT Field,Field2
FROM Table1;
Why does this happen?
I changed my regex to
(?im)^sql[\s\S]*?^;$
and now the first and the second one are matched, but not the third one.
https://regex101.com/r/qLUbBh/3
(?im)^sql[\s\S]*?;$
This works.

Find a string with or without space in oracle using like or regex

I have a string which contains specific 'winner code' which needs to be matched exactly but in the database some records contains spaces and extra characters within 'winners code' and if I use 'like operator' it only returns the matching criteria. I want to use one simplified query which can return all the records if it contains the winner code.Please find below my query and details
Winner code - أ4 ب3 ج10
Records with spaces - أ4 ب 3 ج 10
Records with extra character - (أ(4)
ب(3)
ج(10
My Query -
SELECT COLUMN_NAME,
FROM TABLE_NAME
WHERE
((COLUMN_NAME LIKE '%أ4%ب3%ج10%') or(COLUMN_NAME LIKE '%أ 4%ب 3%ج 10%'))
The above query returns with and without space data as its matching the criteria.
Thanks
If I correctly understand your need, you may try :
with test(str) as (
select '10X3Y4Z' from dual union all
select '10 X 3 Y 4 Z' from dual union all
select '(10)X(3)Y(4)Z' from dual union all
select '10#X3Y4 Z' from dual union all
select '10 # X3Y4Z' from dual )
select str
from test
where regexp_instr(str, '10[ |\)]{0,1}X[ |\(]{0,1}3[ |\)]{0,1}Y[ |\(]{0,1}4[ |\)]{0,1}Z') != 0
This matches your "winner code" ( I used different characters to simplify my test) even if the numbers are surrounded by '()' or a single space.
This can be re-written in a more compact way, but I believe this form is clear enough; it uses regular expressions like [ |\)]{0,1} to match a space or a parenthesis, with zero or one occurrence.

How to compare Unicode characters in SQL server?

Hi I am trying to find all rows in my database (SQL Server) which have character é in their text by executing the following queries.
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\xE9]%',question) > 0;
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\u00E9]%',question) > 0;
But I found two problems: (a) Both of them are returning different number of rows and (b) They are returning rows which do not have the specified character.
Is the way I am constructing the regular expression and comparing the Unicode correct?
EDIT:
The question column is stored using datatype nvarchar.
The following query gives the correct result though.
SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%';
Why not use SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%'?
NB: Likeand patindex do not accept regular expressions.
In the SQL Server pattern syntax [\xE9] means match any single character within the specified set. i.e. match \, x, E or 9. So any of the following strings would match that pattern.
"Elephant"
"axis"
"99.9"