How to simplify postgres regexp_replace - regex

Is there a way to simplify this query using only one regexp_replace?
select regexp_replace(regexp_replace('BL 081', '([^0-9.])', '', 'g'), '(^0+)', '', 'g')
the result should be 81
I'm trying to remove all non-numeric chars and leading 0's from the result

You can do this by capturing the digits you want (not including any leading zeros) and removing everything else:
select regexp_replace('BL 0081', '.*?([1-9][0-9]*)$', '\1')
Output
81
Note you don't need the g flag as you are only making one replacement.
Demo on dbfiddle

Why not just change the range from 0-9 to 1-9?
regexp_replace('BL 081', '(^[^1-9]+)', '', 'g')

This pattern should do: \D+|(?<=\s)0+
\D - matches characters that are not digits
(?<=\s) - looks behind for spaces and matches leading zeros
You can use 1 fewer regexp_replace:
select regexp_replace('BL 081', '\D+|(?<=\s)0+', '', 'g')
# outputs 81
alternatively, if you are interested in the numeric value, you could use a simpler regex and then cast to an integer.
select regexp_replace('BL 081', '\D+', '')::int
# also outputs 81, but its type is int

Related

How to remove all special characters and extra spaces between words with Regex in PostgreSQL

In migrating a text string from one database to another, I need to eliminate all special characters and keep only one space between words.
Unfortunately, the below code eliminates all spaces between words -- not what I want.
Here is the code I have. The "sig" field is wrong (it removes all spaces without leaving one space). Where did I go wrong?
TIA
WITH dbl_medications AS (
SELECT *
FROM dblink('select medname, sig, form from medications')
AS t1(medname text, sig text, form text)
ORDER BY medname, form, sig
)
INSERT INTO medications (medname, sig, form)
SELECT REGEXP_REPLACE(LOWER(REGEXP_REPLACE(medname,'[^a-zA-Z0-9 /-]','','g')), '^ +| +$| +(?= )', '', 'g'),
REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^0-9a-zA-Z:/]',' ','g')), '^ +| +$| +(?= )', '', 'g'),
LOWER(REGEXP_REPLACE(form,'[^a-zA-Z]','','g'))
FROM dbl_medications
ORDER BY 1,3,2
ON CONFLICT (medname, sig, form) DO NOTHING;
You can use
REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^[:alnum:][:space:]:/]+',' ','g')), '^[[:space:]]+|[[:space:]]+$|[[:space:]]+(?=[[:space:]])', '', 'g')
The first [^[:alnum:][:space:]:/]+ regex is used to replace chunks of one or more chars other than alphanumeric, whitespace, : and / chars with a single space.
The ^[[:space:]]+|[[:space:]]+$|[[:space:]]+(?=[[:space:]]) regex is used to remove leading (^[[:space:]]+) and trailing ([[:space:]]+$) whitespaces, and remove excessive whitespace ([[:space:]]+(?=[[:space:]])).

Oracle: Special characters filter with few exceptions

I need some quick help.
I want to filter the input string and remove special characters except space( ), period(.), comma(,), hyphen(-), ampersand(&) and apostrophe(').
I am using below but it's filtering out everything except period(.) and comma(,).
SELECT REGEXP_REPLACE('*Bruce*-*Martha*-&-*Thomas%* *Wyane''s* *Enterprises* ([#Pvt,Ltd.])', '[^0-9A-Za-z,.'' ]', '')
FROM dual;
Input String: *Bruce*-*Martha*-&-*Thomas%* *Wyane's* *Enterprises* ([#Pvt,Ltd.])
What I am expecting: Bruce-Martha-&-Thomas Wyane's Enterprises Pvt,Ltd.
What I am getting: BruceMarthaThomas Wyane's Enterprises Pvt,Ltd.
Thanks.
You may use
SELECT REGEXP_REPLACE('*Bruce*-*Martha*-&-*Thomas%* *Wyane''s* *Enterprises* ([#Pvt,Ltd.])', '[^&0-9A-Za-z,.'' -]+', '') FROM dual
See the regex demo
The [^&0-9A-Za-z,.'' -]+ pattern will match one or more occurrences of any char but &, ASCII letter, digit, comma, dot, single apostrophe, space and hyphen.
To support any whitespace, replace the literal space with [:space:]:
'[^&0-9A-Za-z,.''[:space:]-]+'

negative look ahead on whole number but preceded by a character(perl)

I have text like this;
2500.00 $120.00 4500 12.00 $23.00 50.0989
Iv written a regex;
/(?!$)\d+\.\d{2}/g
I want it to only match 2500.00, 12.00 nothing else.
the requirement is that it needs to add the '$' sign onto numeric values that have exactly two digits after the decimal point. with the current regex it ads extra '$' to the ones that already have a '$' sign. its longer but im just saying it briefly. I know i can use regex to remove the '$' then use another regex to add '$' to all the desired numbers.
any help would be appreciated thanks!
To answer your question, you need to look before the pos where the first digit is.
(?<!\$)
But that's not going to work as it will match 23.45 of $123.45 to change it into $1$23.45, and it will match 123.45 of 123.456 to change it into $123.456. You want to make sure there's no digits before or after what you match.
s/(?<![\$\d])(\d+\.\d{2})(?!\d)/\$$1/g;
Or the quicker
s/(?<![\$\d])(?=\d+\.\d{2}(?!\d))/\$/g;
This is tricky only because you are trying to include too many functionalities in your single regex. If you manipulate the string first to isolate each number, this becomes trivial, as this one-liner demonstrates:
$ perl -F"(\s+)" -lane's/^(?=\d+\.\d{2}$)/\$/ for #F; print #F;'
2500.00 $120.00 4500 12.00 $23.00 50.0989
$2500.00 $120.00 4500 $12.00 $23.00 50.0989
The full code for this would be something like:
while (<>) { # or whatever file handle or input you read from
my #line = split /(\s+)/;
s/^(?=\d+\.\d{2}$)/\$/ for #line;
print #line; # or select your desired means of output
# my $out = join "", #line; # as string
}
Note that this split is non-destructive because we use parentheses to capture our delimiters. So for our sample input, the resulting list looks like this when printed with Data::Dumper:
$VAR1 = [
'2500.00',
' ',
'$120.00',
' ',
'4500',
' ',
'12.00',
' ',
'$23.00',
' ',
'50.0989'
];
Our regex here is simply anchored in both ends, and allowed to contain numbers, followed by a period . and two numbers, and nothing else. Because we use a look-ahead assertion, it will insert the dollar sign at the beginning, and keep everything else. Because of the strictness of our regex, we do not need to worry about checking for any other characters, and because we split on whitespace, we do not need to check for any such.
You can use this pattern:
s/(?<!\S)\d+\.\d{2}(?!\S)/\$${^MATCH}/gp
or
s/(?<!\S)(?=\d+\.\d{2}(?!\S))/\$/g
I think it is the shorter way.
(?<!\S) not preceded by a character that is not a white character
(?!\S) not followed by a character that is not a white character
The main interest of these double negations is that you include automaticaly the begining and the end of the string cases.

Regex to remove special characters. Can't get rid of trailing ellipsis

In the context of a postgres query, this -
lower(regexp_replace('If...', '[^\w\s]', ''))
gives me this -
'if..' (quotes mine)
As you can see, only one of the three periods gets trimmed. Can someone tell me what I must add to my regexp to get rid of the other two or any other special characters that might be trailing in this way?
You are probably looking for the fourth, optional parameter of regexp_replace():
SELECT regexp_replace('If...', '[^\w\s]', '', 'g');
g .. for "globally", i.e. replace every match in the string, not just the first.
SELECT regexp_replace('If, stay real....', '[.]{2,}$', '.', 'g');
{m,} a sequence of m or more matches of the atom.
More than 2 dot in the string will be replaced with one dot.
further reference: https://www.postgresql.org/docs/current/functions-matching.html

postgres regexp_replace want to allow only a-z and A-Z

In a table column in string we can have numbers/special chars/white spaces.
I want to replace numbers/special chars/white space with empty char, i see there is function named regexp_replace but how to use not much user friendly help avaialble for example i want to use following string.
String = 'abc$wanto&toremove#special~chars'
I want to remove all special chars and numbers from above string want to allow only a-z and A-Z rest of chars should be replaced with '' how to do that ?
SELECT regexp_replace('abc$wanto&toremove#special~chars', '[^a-zA-Z]', '', 'g');
regexp_replace
------------------------------
abcwantotoremovespecialchars
For me the following worked.
regexp_replace(code, '[^a-zA-Z0-9]+', '','g')
As it adds global filter so it repeats the regex for the entire string.
Example,
SELECT regexp_replace('Well- This Did-Not work&*($%%)_', '[^a-zA-Z0-9]+', '')
Returns: "WellThis Did-Not work&*($%%)_"
SELECT regexp_replace('Well- This Did-Not work&*($%%)_', '[^a-zA-Z0-9]+', '','g')
Returns: "WellThisDidNotwork"
Which has the characters we don't want removed.
To make it simpler:
regexp_replace('abc$wanto&toremove#special~chars', '[^[:alpha:]]')
If you want to replace the char with the closest not special char, you can do something like this:
select
translate(
lower( name ), ' ''àáâãäéèëêíìïîóòõöôúùüûçÇ', '--aaaaaeeeeiiiiooooouuuucc'
) as new_name,
name
from cities;
Should be:
regexp_replace('abc$wanto&toremove#special~chars', '[^a-zA-Z]+', '')