postgres - regex_replace in distinct clause? - regex

Ok... changing the question here... I'm getting an error when I try this:
SELECT COUNT ( DISTINCT mid, regexp_replace(na_fname, '\\s*', '', 'g'), regexp_replace(na_lname, '\\s*', '', 'g'))
FROM masterfile;
Is it possible to use regexp in a distinct clause like this?
The error is this:
WARNING: nonstandard use of \\ in a string literal
LINE 1: ...CT COUNT ( DISTINCT mid, regexp_replace(na_fname, '\\s*', ''...

select trim(regexp_replace(E'\tfoo \t bar baz ', E'\\s+', ' ', 'g'))
replaces all (due to the 'g' flag) whitespace (\s) sequences (+) with a single space, then trims it, returning:
"foo bar baz"
The E is to indicate that the \ escape encoding is used.
With your new, edited question, you're probably looking for a query along the lines of:
select count(*) from (
select distinct
mid,
regexp_replace(na_fname, E'\\s*', '', 'g'),
regexp_replace(na_lname, E'\\s*', '', 'g')
from masterfile) as subquery;

Related

Regular expression - Remove special characters except single white space

From stack overflow, I got the standard reg expression
to eliminate -
a) special characters
b) digits
c) more than 2 spaces to single space
to include -
d) - (hyphen)
e) ' (single quote)
SELECT ID, REGEXP_REPLACE(REGEXP_REPLACE(forenames, '[^A-Za-z-]', ' '),'\s{2,}',' ') , REGEXP_REPLACE(REGEXP_REPLACE(surname, '[^A-Za-z-]', ' '),'\s{2,}',' ') , forenames, surname from table1;
Instead of 2 functions how to get the result in single function?
to include '(single quote) \' is not working in regexp_replace.
Thanks.
Oracle Setup:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '123a45b£$- ''c45d#{e''' FROM DUAL
Query:
SELECT ID,
REGEXP_REPLACE(
value,
'[^a-zA-Z'' -]| +( )',
'\1'
)
FROM test_data
Output:
ID | REGEXP_REPLACE(VALUE,'[^A-ZA-Z''-]|+()','\1')
-: | :--------------------------------------------
1 | ab- 'cde'
db<>fiddle here

regex in Python to remove commas and spaces

I have a string with multiple commas and spaces as delimiters between words. Here are some examples:
ex #1: string = 'word1,,,,,,, word2,,,,,, word3,,,,,,'
ex #2: string = 'word1 word2 word3'
ex #3: string = 'word1,word2,word3,'
I want to use a regex to convert either of the above 3 examples to "word1, word2, word3" - (Note: no comma after the last word in the result).
I used the following code:
import re
input_col = 'word1 , word2 , word3, '
test_string = ''.join(input_col)
test_string = re.sub(r'[,\s]+', ' ', test_string)
test_string = re.sub(' +', ',', test_string)
print(test_string)
I get the output as "word1,word2,word3,". Whereas I actually want "word1, word2, word3". No comma after word3.
What kind of regex and re methods should I use to achieve this?
you can use the split to create an array and filter len < 1 array
import re
s='word1 , word2 , word3, '
r=re.split("[^a-zA-Z\d]+",s)
ans=','.join([ i for i in r if len(i) > 0 ])
How about adding the following sentence to the end your program:
re.sub(',+$','', test_string)
which can remove the comma at the end of string
One approach is to first split on an appropriate pattern, then join the resulting array by comma:
string = 'word1,,,,,,, word2,,,,,, word3,,,,,,'
parts = re.split(",*\s*", string)
sep = ','
output = re.sub(',$', '', sep.join(parts))
print(output
word1,word2,word3
Note that I make a final call to re.sub to remove a possible trailing comma.
You can simply use [ ]+ to detect extra spaces and ,\s*$ to detect the last comma. Then you can simply substitute the [ ]+,[ ]+ with , and the last comma with an empty string
import re
input_col = 'word1 , word2 , word3, '
test_string = re.sub('[ ]+,[ ]+', ', ', input_col) # remove extra space
test_string = re.sub(',\s*$', '', test_string) # remove last comma
print(test_string)

Oracle REGEX REPLACE - How to replace quote with backslash quote, newline with space, and blackslash with double blackslash

I have some fields that could have the following values:
"
\
\n <-- or any possible carriage return
I want to replace them with the following:
\"
\\
<-- this represents a space
Ideally, I would like to do it in a single pass using REGEX_REPLACE or some other method.
I'm currently doing it like the following. It is inefficient because it has to make three passes.
SELECT replace(
replace(
replace(
'He\llo " I am \na \string\n pl"eas\ne fi"x me\n,
'\',
'\\'
),
'\n',
' '
),
'"',
'\"'
)
FROM DUAL;
The output should be
He\\llo \" I am \ a \\string\ pl\"eas\ e fi\"x me\

Query to replace a string with matching pattern

I have a string like below.
'comp' as "COMPUTER",'ms' as "MOUSE" ,'keybr' as "KEYBOARD",'MONT' as "MONITOR",
Is it possible to write a query so that i will get the result as
'comp' ,'ms' ,'keybr' ,'MONT' ,
I can replace the string "as" with empty string using REPLACE query.
But how do I remove the string inside double quote?
Can anyone help me doing this?
Thanks in advance.
select replace(
regexp_replace('''comp'' as "COMPUTER"'
, '(".*")'
,null)
,' as '
,null)
from dual
The regex '(".*")' selects text whatever in double quotes.
EDIT:
the regex replaced the entire length of the matching pattern. So, we might need to tokenise the string first using comma as delimiter and apply the regex. Later join it.(LISTAGG)
WITH str_tab(str1, rn) AS
(SELECT regexp_substr(str, '[^,]+', 1, LEVEL), -- delimts
LEVEL
FROM (SELECT '''comp'' as "computer",''comp'' as "computer"' str
FROM dual) tab
CONNECT BY LEVEL <= LENGTH(str) - LENGTH(REPLACE(str, ',')) + 1)
SELECT listagg(replace(
regexp_replace(str1
, '(".*")'
,null)
,' as '
,null), ',') WITHIN GROUP (ORDER BY rn) AS new_text
FROM str_tab;
EDIT2:
A cleaner approach from #EatAPeach
with x(y) as (
select q'<'comp' as "COMPUTER",'ms' as "MOUSE" ,'keybr' as "KEYBOARD",'MONT' as "MONITOR",>'
from dual
)
select y,
regexp_replace(y, 'as ".*?"' ,null)
from x;

regexp_substr skips over empty positions

With this code to return the nth value in a pipe delimited string...
regexp_substr(int_record.interfaceline, '[^|]+', 1, i)
it works fine when all values are present
Mike|Male|Yes|20000|Yes so the 3rd value is Yes (correct)
but if the string is
Mike|Male||20000|Yes, the 3rd value is 20000 (not what I want)
How can I tell the expression to not skip over the empty values?
TIA
Mike
The regexp_substr works this way:
If occurrence is greater than 1, then the database searches for the
second occurrence beginning with the first character following the
first occurrence of pattern, and so forth. This behavior is different
from the SUBSTR function, which begins its search for the second
occurrence at the second character of the first occurrence.
So the pattern [^|] will look for NON pipes, meaning it will skip consecutive pipes ("||") looking for a non-pipe char.
You might try:
select trim(regexp_substr(replace('A|test||string', '|', '| '), '[^|]+', 1, 4)) from dual;
This will replace a "|" with a "| " and allow you to match based on the pattern [^|]
I had a similar problem with a CSV file thus my separator was the semicolon (;)
So I started with an expression like the following one:
select regexp_substr(';2;;4;', '[^;]+', 1, i) from dual
letting i iterate from 1 to 5.
And of course it didn't work either.
To get the empty parts I just say they could be at the beginning (^;), or in the middle (;;) or at the end (;$). And or-ing all of this together gives:
select regexp_substr(';2;;4;', '[^;]+|^;|;;|;$', 1, i) from dual
And believe me or not: testing for i from 1 to 5 it works!
But let's not forgot the last details: with this approach you get ; for fields that are empty originally.
The next lines are showing how to get rid of them easily replacing them by empty strings(nulls):
with stage1 as (
select regexp_substr(';2;;4;', '[^;]+|^;|;;|;$', 1, 2) as F from dual
)
select case when F like '%;' then '' else F end from stage1
OK. This should be the best solution for you.
SELECT
REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'^([^|]*\|){2}([^|]*).*$',
'\2' )
TEXT
FROM
DUAL;
So for your problem
SELECT
REGEXP_REPLACE ( INCOMINGSTREAMOFSTRINGS,
'^([^|]*\|){N-1}([^|]*).*$',
'\2' )
TEXT
FROM
DUAL;
--INCOMINGSTREAMOFSTRINGS is your complete string with delimiter
--You should pass n-1 to obtain nth position
ALTERNATE 2:
WITH T AS (SELECT 'Mike|Male||20000|Yes' X FROM DUAL)
SELECT
X,
REGEXP_REPLACE ( X,
'^([^|]*).*$',
'\1' )
Y1,
REGEXP_REPLACE ( X,
'^[^|]*\|([^|]*).*$',
'\1' )
Y2,
REGEXP_REPLACE ( X,
'^([^|]*\|){2}([^|]*).*$',
'\2' )
Y3,
REGEXP_REPLACE ( X,
'^([^|]*\|){3}([^|]*).*$',
'\2' )
Y4,
REGEXP_REPLACE ( X,
'^([^|]*\|){4}([^|]*).*$',
'\2' )
Y5
FROM
T;
ALTERNATE 3:
SELECT
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
1,
NULL,
2 )
AS FIRST,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
2,
NULL,
2 )
AS SECOND,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
3,
NULL,
2 )
AS THIRD,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
4,
NULL,
2 )
AS FOURTH,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
5,
NULL,
2 )
AS FIFTH
FROM
DUAL;
You can use the following :
with l as (select 'Mike|Male||20000|Yes' str from dual)
select regexp_substr(str,'(".*"|[^|]*)(\||$)',1,level,null,1)
from dual,l
where level=3/*use any position*/ connect by level <= regexp_count(str,'([^|]*)(\||$)')
As an complement to #tbone response...
Oddly, my oracle didn't recognize the blank space character in this list: [^|]
In this cases can be confusing and hard to realize what is going wrong.
Try with this regex ([^|]| )+. Also, to detect a posible first blank item, it is better to replace the separator with the space blank before, and not after it:
' |'
trim(regexp_substr(replace('A|test||string', '|', ' |'), '([^|]| )+', 1, 4))