Regex replace - match and empty all lines not containing a specific character - regex

I can not use grep. In fact, I am in Notepad2. When I want to remove lines containing character "c", I am using the replace dialog (Ctrl+H):
Search string: ".*c.*"
Replace with: "" (nothing)
After that, I sort the lines and I get rid of the empty lines.
But now I need to empty all lines that actually do not contain character "c". Is it possible to do it in Notepad2?
If I can do it in Notepad2, then I can do it using JavaScript's String replace too, I guess.

Yes, you could anchor your pattern and use a negated character class.
Find: ^[^c]*$
Explanation:
^ # the beginning of the string
[^c]* # any character except: 'c' (0 or more times)
$ # before an optional \n, and the end of the string

Related

Replace characters within a specific string

I have a text file with URLs where space is + and it needs to be %20 to work.
For example:
http://myserver/abc/this+is+my+document.doc
I want it to be:
http://myserver/abc/this%20is%20my%20document.doc
How to replace + with %20, but only when the string starts with http://myserver/abc? Don't want to replace any other +'s in the document.
Thanks in advance!
You can use the following regex:
See it in use here
(?:http://myserver/abc|\G(?!\A))[^\s+]*\K\+
Replace with %20
How the regex works?
(?:http://myserver/abc|\G(?!\A)) matches either http://myserver/abc literally, or the previously matched location (\G is previously matched location or start of the string and (?!\A) prevents \G from matching the start of the string)
[^\s+]* matches any character except whitespace and + (literally) any number of times
\K resets the match. Any previously consumed characters are excluded from the final match
\+ match this character literally

\1 not defined in the RE

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

Regex to allow only # in the string and block the special characters

I need to write a regex to allow the contents but block the special characters ' and -- in the string. I am working on a product which uses the regex to allow or block contents goofing around the product, I managed to find the below pattern:
^('|--|#|\\x27|\\x23)$
Which is supposed to match --, ' and # in the string, but when I tested this pattern in some online regex pattern matching. it was not highlighting the string when it contains --, ' or #.
See Start of String and End of String Anchors at regular-expressions.info:
The caret ^ matches the position before the first character in the string. Applying ^a to abc matches a. ^b does not match abc at all, because the b cannot be matched right after the start of the string, matched by ^.
Similarly, $ matches right after the last character in the string. c$ matches c in abc, while a$ does not match at all.
Also, \x27 matches a ', and \x23 matches a #, thus, no need doubling them with literals.
So, you just need
(--|\x27|\x23)
Or (using a non-capturing group):
(?:--|\x27|\x23)
See demo

Regex to find a expression followed by whitespace, #, # or end of input

I want to find all instance of word (say myword), with the added condition that the word has whitespace, "#", "#" afterwords, or is the end of input.
Input string:
"myword# myword mywordrick myword# myword"
I want the regex to match everything besides mywordtrick -
myword#
myword
myword#
myword
I am able to match against the first 3 with myword[##\s]
I thought myword[##\s\z] would match against all 4, but I only get 3
I try myword[\z] and get no matches
I try myword\z and get 1 match.
I figure \z inside a [] doesn't work, because [] is character based logic, rather than position based logic.
Is there a way to use a single regex to match the expressions I am interested in? I do not want to use both myword[##\s] and myword\z unless I really have to.
Your regex would be,
myword(?:[##\s]|$)
It matches the string myword along with the symbols only if it's followed by # or # or \s or $. $ means the end of the line.
DEMO

regex_replace doesn't replace the hyphen/dash

I'm using regex_replace in postgreSQL and trying to strip out any character in a string that is not a letter or number. However, using this regex:
select * from regexp_replace('blink-182', '[^a-zA-Z0-9]*$', '')
returns 'blink-182'. The hyphen is not being removed and replaced with nothing ('') as I would expect.
How do I modify this regex to also replace the hypen - I've tested with many other characters (!,.#) and they are all replaced correctly.
Any ideas?
You currently replace a run of non-alphanumeric characters at the end of the string only. I guess your tests were mainly strings of the form foobar!# which worked because the characters to remove were at the end of the string.
To replace every occurrence of such a character in the string remove the $ from the regex:
[^a-zA-Z0-9]+
(also I changed the * into a + to prevent zero-length replaces between every character.
If you want to retain whitespace as well you need to add it to the character class:
[^a-zA-Z0-9 ]+
or possibly
[^a-zA-Z0-9\s]+
If the regex in the beginning was in fact correct in that you only want to remove non-alphanumeric characters from the end of the string but you also want to remove hyphen-minus in the middle of a string (while retaining other non-alphanumeric characters in the middle of the string), then the following should work:
[^a-zA-Z0-9]+$|-
maniek points out that you need to add an argument to regexp_replace so it will replace more than once match:
regexp_replace('blink-182', '[^a-zA-Z0-9]+$|-', '', 'g')