I'm using regex_replace in postgreSQL and trying to strip out any character in a string that is not a letter or number. However, using this regex:
select * from regexp_replace('blink-182', '[^a-zA-Z0-9]*$', '')
returns 'blink-182'. The hyphen is not being removed and replaced with nothing ('') as I would expect.
How do I modify this regex to also replace the hypen - I've tested with many other characters (!,.#) and they are all replaced correctly.
Any ideas?
You currently replace a run of non-alphanumeric characters at the end of the string only. I guess your tests were mainly strings of the form foobar!# which worked because the characters to remove were at the end of the string.
To replace every occurrence of such a character in the string remove the $ from the regex:
[^a-zA-Z0-9]+
(also I changed the * into a + to prevent zero-length replaces between every character.
If you want to retain whitespace as well you need to add it to the character class:
[^a-zA-Z0-9 ]+
or possibly
[^a-zA-Z0-9\s]+
If the regex in the beginning was in fact correct in that you only want to remove non-alphanumeric characters from the end of the string but you also want to remove hyphen-minus in the middle of a string (while retaining other non-alphanumeric characters in the middle of the string), then the following should work:
[^a-zA-Z0-9]+$|-
maniek points out that you need to add an argument to regexp_replace so it will replace more than once match:
regexp_replace('blink-182', '[^a-zA-Z0-9]+$|-', '', 'g')
Related
I have a text file with URLs where space is + and it needs to be %20 to work.
For example:
http://myserver/abc/this+is+my+document.doc
I want it to be:
http://myserver/abc/this%20is%20my%20document.doc
How to replace + with %20, but only when the string starts with http://myserver/abc? Don't want to replace any other +'s in the document.
Thanks in advance!
You can use the following regex:
See it in use here
(?:http://myserver/abc|\G(?!\A))[^\s+]*\K\+
Replace with %20
How the regex works?
(?:http://myserver/abc|\G(?!\A)) matches either http://myserver/abc literally, or the previously matched location (\G is previously matched location or start of the string and (?!\A) prevents \G from matching the start of the string)
[^\s+]* matches any character except whitespace and + (literally) any number of times
\K resets the match. Any previously consumed characters are excluded from the final match
\+ match this character literally
I need to remove the special characters from the beginning and the end of each word. But there are few words where it gets tricky. Btw, I am working in tableau.
Word:
;#Bank#;Server#;
I used
REGEXP_REPLACE([Category], "[^0-9a-zA-Z ]+", "")
Using this code, the word becomes BANKSERVER. I want a comma or a semicolon between Bank and server. How can I achieve this? Any possible leads would be greatly appreciated.
Actual result:
BANKSERVER
Expected result:
BANK,SERVER
or
BANK;SERVER
Alternatively, If I add a semicolon to the regex code:
REGEXP_REPLACE([Category], "[^0-9a-zA-Z ;]+", "")
However, the output is as follows:
;BANK;SERVER;
You may use
REGEXP_REPLACE(REGEXP_REPLACE([Category], '[^0-9a-zA-Z ]+', ';'), '^(?:\s*;)+\s*|\s*(?:;+\s*)+$|\s*(?:(;)+\s*)+', '$1')
Details
REGEXP_REPLACE([Category], '[^0-9a-zA-Z ]+', ';') - replaces all chars but alphanumeric and space with ; chars
REGEXP_REPLACE(..., '^(?:\s*;)+\s*|\s*(?:;+\s*)+$|\s*(?:(;)+\s*)+', '$1') - removes leading/trailing semi-colons and shrinks 1+ semi-colons with a single semicolon also removing any whitespace.
Just add semicolon in the character class:
REGEXP_REPLACE([Category], "[^0-9a-zA-Z; ]+", "")
// here __^
Remove All Special Characters at the End and the Beginning of String and then Replace Separators
This is similar to Wiktor's solution and borrows his approach to pass through the string twice.
First, clean up the string at the beginning and the end using greedy quantifiers with a minimum match of one.
'(^[^a-zA-Z0-9]+)|([^a-zA-Z0-9]+$)'
Clean the "separator" special characters greedily with a minimum quantifier of 1, replacing with a , character.
'([^a-zA-Z0-9]+)'
Solution:
REGEXP_REPLACE (REGEXP_REPLACE([Word], '(^[^a-zA-Z0-9]+)|([^a-zA-Z0-9]+$)', '' ), '([^a-zA-Z0-9]+)' , ',')
So I cant use $' variable
But i need to find the pattern that in a file that starts with the string “by: ” followed by any characters , then replace whatever characters comes after “by: ” with an existing string $foo
im using $^I and a while loop since i need to update multiple fields in a file.
I was thinking something along the lines of [s///]
s/(by\:[a-z]+)/$foo/i
I need help. Yes this is an assignment question but im 5 hours and ive lost many brain cells in the process
Some problems with your substitution:
You say you want to match by: (space after colon), but your regex will never match the space.
The pattern [a-z]+ means to match one or more occurrences of letters a to z. But you said you want to match "any characters". That might be zero characters, and it might contain non-letters.
You've replaced the match with $foo, but have lost by:. The entire matched string is replaced with the replacement.
No need to escape : in your pattern.
You're capturing the entire match in parentheses, but not using that anywhere.
I'm assuming you're processing the file line-by line. You want "starts with the string by: followed by any characters". This is the regex:
/^by: .*/
^ matches beginning of line. Then by: matches exactly those characters. . matches any character except for a newline, and * means zero-or more of the preceding item. So .* matches all the rest of the characters on the line.
"replace whatever characters that come after by: with an existing string $foo. I assume you mean the contents of the variable $foo and not the literal characters $foo. This is:
s/^by: .*/by: $foo/;
Since we matched by:, I repeated it in the replacement string because you want to preserve it. $foo will be interpolated in the replacement string.
Another way to write this would be:
s/^(by: ).*/$1$foo/
Here we've captured the text by: in the first set of parentheses. That text will be available in the $1 variable, so we can interpolate that into the replacement string.
I can not use grep. In fact, I am in Notepad2. When I want to remove lines containing character "c", I am using the replace dialog (Ctrl+H):
Search string: ".*c.*"
Replace with: "" (nothing)
After that, I sort the lines and I get rid of the empty lines.
But now I need to empty all lines that actually do not contain character "c". Is it possible to do it in Notepad2?
If I can do it in Notepad2, then I can do it using JavaScript's String replace too, I guess.
Yes, you could anchor your pattern and use a negated character class.
Find: ^[^c]*$
Explanation:
^ # the beginning of the string
[^c]* # any character except: 'c' (0 or more times)
$ # before an optional \n, and the end of the string
I have a .org file with lines of this sort:
*
http://en.wikipedia.org/wiki/Qibla Qibla - Wikipedia, the free
as you can see, an asterisk, followed by newline, followed by URL, followed by one space, and then some extraneous useless text that i want to get rid of.
i would like to format this file to this structure:
*
http://en.wikipedia.org/wiki/Qibla
or, strip all the characters after the end of the URL while maintaining the rest of the structure.
how can i do this in emacs?
Assuming you're doing this interactively with query-replace-regexp, try using this regex to string the junk off the end of the URLS:
^\(http[^ ]+\).*$
Replacement:
\1
You can get rid of the asterisks easily enough, use this regex and replace with nothing:
*^J
Use control-Q followed by control-J to enter the newline.
Edit: Or, to do it in one, replace
*^J\(http[^ ]+\) .*^J
With
\1^J
Where ^J is a literal newline inserted by typing control-Q followed by control-J.