Remove / Replace special characters, with exception - regex

I have a string, and I want to remove all special characters, including spaces. Except, I want to leave the colon if it exists in the string.
I was using this, and it was sort of working, but appears to not replace parens, or back slash or dashes.......
TRIM(REGEXP_REPLACE(REPLACE(REGEXP_REPLACE(c.category_name,'[^:^0-9A-Za-z ]',''),' : ','|'), '\s+', '_', 'g'))
Please advise

You can add the colon into the list of allowed characters
[^0-9A-Za-z:]: anything that is not a number, a letter or a colon.
'g': apply the replacement as many time as needed (else it would stop at the 1st one)
select REGEXP_REPLACE('0.1[2]?ab cd:ef g&*(h)/ij;','[^0-9A-Za-z:]','','g');
regexp_replace
----------------
012abcd:efghij

You are missing the 'g' ("global") flag, so only the first unwanted character was being removed (replaced by blank).
Change:
REGEXP_REPLACE(c.category_name, '[^:^0-9A-Za-z ]', '')
to:
REGEXP_REPLACE(c.category_name, '[^:0-9A-Za-z ]', '', 'g')
Note: I removed the extra ^ from the regex, but leave it in if you want to keep ^ characters too.

Related

How to remove all special characters and extra spaces between words with Regex in PostgreSQL

In migrating a text string from one database to another, I need to eliminate all special characters and keep only one space between words.
Unfortunately, the below code eliminates all spaces between words -- not what I want.
Here is the code I have. The "sig" field is wrong (it removes all spaces without leaving one space). Where did I go wrong?
TIA
WITH dbl_medications AS (
SELECT *
FROM dblink('select medname, sig, form from medications')
AS t1(medname text, sig text, form text)
ORDER BY medname, form, sig
)
INSERT INTO medications (medname, sig, form)
SELECT REGEXP_REPLACE(LOWER(REGEXP_REPLACE(medname,'[^a-zA-Z0-9 /-]','','g')), '^ +| +$| +(?= )', '', 'g'),
REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^0-9a-zA-Z:/]',' ','g')), '^ +| +$| +(?= )', '', 'g'),
LOWER(REGEXP_REPLACE(form,'[^a-zA-Z]','','g'))
FROM dbl_medications
ORDER BY 1,3,2
ON CONFLICT (medname, sig, form) DO NOTHING;
You can use
REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^[:alnum:][:space:]:/]+',' ','g')), '^[[:space:]]+|[[:space:]]+$|[[:space:]]+(?=[[:space:]])', '', 'g')
The first [^[:alnum:][:space:]:/]+ regex is used to replace chunks of one or more chars other than alphanumeric, whitespace, : and / chars with a single space.
The ^[[:space:]]+|[[:space:]]+$|[[:space:]]+(?=[[:space:]]) regex is used to remove leading (^[[:space:]]+) and trailing ([[:space:]]+$) whitespaces, and remove excessive whitespace ([[:space:]]+(?=[[:space:]])).

Oracle: Special characters filter with few exceptions

I need some quick help.
I want to filter the input string and remove special characters except space( ), period(.), comma(,), hyphen(-), ampersand(&) and apostrophe(').
I am using below but it's filtering out everything except period(.) and comma(,).
SELECT REGEXP_REPLACE('*Bruce*-*Martha*-&-*Thomas%* *Wyane''s* *Enterprises* ([#Pvt,Ltd.])', '[^0-9A-Za-z,.'' ]', '')
FROM dual;
Input String: *Bruce*-*Martha*-&-*Thomas%* *Wyane's* *Enterprises* ([#Pvt,Ltd.])
What I am expecting: Bruce-Martha-&-Thomas Wyane's Enterprises Pvt,Ltd.
What I am getting: BruceMarthaThomas Wyane's Enterprises Pvt,Ltd.
Thanks.
You may use
SELECT REGEXP_REPLACE('*Bruce*-*Martha*-&-*Thomas%* *Wyane''s* *Enterprises* ([#Pvt,Ltd.])', '[^&0-9A-Za-z,.'' -]+', '') FROM dual
See the regex demo
The [^&0-9A-Za-z,.'' -]+ pattern will match one or more occurrences of any char but &, ASCII letter, digit, comma, dot, single apostrophe, space and hyphen.
To support any whitespace, replace the literal space with [:space:]:
'[^&0-9A-Za-z,.''[:space:]-]+'

Regex Find/Replace char on a line before a specific word

Hope here is the right place to write ask this question.
I am preparing a script to import to a database using notepad++.
I have a huge file that has rows like that:
(10496, '69055-230', 'Rua', '5', 'Manaus', 'Parque 10 de Novembro',
'AM'),
INSERT INTO dne id, cep, tp_logradouro, logradouro, cidade,
bairro, uf VALUES
Is there a way using FIND/REPLACE to replace the ',' to ';' on every line before the INSERT statement?
I am not sure how to match the end of the line before a specific word.
The result would be
(10496, '69055-230', 'Rua', '5', 'Manaus', 'Parque 10 de Novembro',
'AM');
INSERT INTO dne id, cep, tp_logradouro, logradouro, cidade,
bairro, uf VALUES
Find what: ,(?=\s*INSERT)
Replace with: ;
Description
, matches a literal comma
(?=\s*INSERT) is a lookeahead that will assert for (but won't consume)
\s* any number of white spaces (including newlines)
INSERT as literal
If you also want to replace any commas before the end of the file, use
,(?=\h*\R\h*INSERT|\s*\z)
Note both expressions would fail if you have another instance of a comma followed by INSERT that shouldn't be replaced, but in that case you should specify it in the question.
You don't even need a regular expression for that.
Select Extended in Search Mode
Replace ,\nINSERT INTO with ;\nINSERT INTO
This matches , at the end of a line just before INSERT INTO at the beginning of the next line. Keep in mind that \n will match only in a Linux/Unix/Mac OS X file. For Windows use \r\n, for Mac OS Classic \r (reference).
Using sublim text or notepad++, click CTRL+h and replace all ")INSERT," by ");INSERT"
I expect that the INSERT statements will all have the form:
INSERT INTO table col1, col2, col3, ...
VALUES (val1, val2, val3, ...),
^^ what you want to replace
Assuming that the only place that ), will be observed is the end of the VALUES line, then you can just can just do the following replacement:
Find: ),$
Replace: );$
You can do this replacement with the regex option enabled.

negative look ahead on whole number but preceded by a character(perl)

I have text like this;
2500.00 $120.00 4500 12.00 $23.00 50.0989
Iv written a regex;
/(?!$)\d+\.\d{2}/g
I want it to only match 2500.00, 12.00 nothing else.
the requirement is that it needs to add the '$' sign onto numeric values that have exactly two digits after the decimal point. with the current regex it ads extra '$' to the ones that already have a '$' sign. its longer but im just saying it briefly. I know i can use regex to remove the '$' then use another regex to add '$' to all the desired numbers.
any help would be appreciated thanks!
To answer your question, you need to look before the pos where the first digit is.
(?<!\$)
But that's not going to work as it will match 23.45 of $123.45 to change it into $1$23.45, and it will match 123.45 of 123.456 to change it into $123.456. You want to make sure there's no digits before or after what you match.
s/(?<![\$\d])(\d+\.\d{2})(?!\d)/\$$1/g;
Or the quicker
s/(?<![\$\d])(?=\d+\.\d{2}(?!\d))/\$/g;
This is tricky only because you are trying to include too many functionalities in your single regex. If you manipulate the string first to isolate each number, this becomes trivial, as this one-liner demonstrates:
$ perl -F"(\s+)" -lane's/^(?=\d+\.\d{2}$)/\$/ for #F; print #F;'
2500.00 $120.00 4500 12.00 $23.00 50.0989
$2500.00 $120.00 4500 $12.00 $23.00 50.0989
The full code for this would be something like:
while (<>) { # or whatever file handle or input you read from
my #line = split /(\s+)/;
s/^(?=\d+\.\d{2}$)/\$/ for #line;
print #line; # or select your desired means of output
# my $out = join "", #line; # as string
}
Note that this split is non-destructive because we use parentheses to capture our delimiters. So for our sample input, the resulting list looks like this when printed with Data::Dumper:
$VAR1 = [
'2500.00',
' ',
'$120.00',
' ',
'4500',
' ',
'12.00',
' ',
'$23.00',
' ',
'50.0989'
];
Our regex here is simply anchored in both ends, and allowed to contain numbers, followed by a period . and two numbers, and nothing else. Because we use a look-ahead assertion, it will insert the dollar sign at the beginning, and keep everything else. Because of the strictness of our regex, we do not need to worry about checking for any other characters, and because we split on whitespace, we do not need to check for any such.
You can use this pattern:
s/(?<!\S)\d+\.\d{2}(?!\S)/\$${^MATCH}/gp
or
s/(?<!\S)(?=\d+\.\d{2}(?!\S))/\$/g
I think it is the shorter way.
(?<!\S) not preceded by a character that is not a white character
(?!\S) not followed by a character that is not a white character
The main interest of these double negations is that you include automaticaly the begining and the end of the string cases.

Regex to remove special characters. Can't get rid of trailing ellipsis

In the context of a postgres query, this -
lower(regexp_replace('If...', '[^\w\s]', ''))
gives me this -
'if..' (quotes mine)
As you can see, only one of the three periods gets trimmed. Can someone tell me what I must add to my regexp to get rid of the other two or any other special characters that might be trailing in this way?
You are probably looking for the fourth, optional parameter of regexp_replace():
SELECT regexp_replace('If...', '[^\w\s]', '', 'g');
g .. for "globally", i.e. replace every match in the string, not just the first.
SELECT regexp_replace('If, stay real....', '[.]{2,}$', '.', 'g');
{m,} a sequence of m or more matches of the atom.
More than 2 dot in the string will be replaced with one dot.
further reference: https://www.postgresql.org/docs/current/functions-matching.html