Find lines without specified string and remove empty lines too - regex

So, I know from this question how to find all the lines that don't contain a specific string. But it leaves a lot of empty newlines when I use it, for example, in a text editor substitution (Notepad++, Sublime, etc).
Is there a way to also remove the empty lines left behind by the substitution in the same regex or, as it's mentioned on the accepted answer, "this is not something regex ... should do"?
Example, based on the example from that question:
Input:
aahoho
bbhihi
cchaha
sshede
ddhudu
wwhada
hede
eehidi
Desired output:
sshede
hede
[edit-1]
Let's try this again: what I want is a way to use regex replace to remove everything that does not contain hede on the text editor. If I try .*hede.* it will find all hede:
But it will not remove. On a short file, this is easy to do manually, but the idea here is to replace on a larger file, with over 1000+ lines, but that would contain anywhere between 20-50 lines with the desired string.
If I use ^((?!hede).)*$ and replace it with nothing, I end up with empty lines:
I thought it was a simple question, for people with a better understanding of regex than me: can a single regex replace also remove those empty lines left behind?

An alternative try
Find what: ^(?!.*hede).*\s?
Replace with: nothing
Explanation:
^ # start of a line
(?!) # a Negative Lookahead
. # matches any character (except for line terminators)
* # matches the previous token between zero and unlimited times,
hede # matches the characters hede literally
\s # matches any whitespace character (equivalent to [\r\n\t\f\v ])
? # matches the previous token between zero and one times,

Using Notepad++.
Ctrl+H
Find what: ^((?!hede).)*(?:\R|\z)
Replace with: LEAVE EMPTY
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
((?!hede).)* # tempered greedy token, make sure we haven't hede in the line
(?:\R|\z) # non capture group, any kind of line break OR end of file
Screenshot (before):
Screenshot (after):

Have you tried:
.*hede.*
I don't know why you are doing an inverse search for this.
You can use sed like:
sed -e '/.*hede.*/!d' input.txt

Related

How would you replace only a single character in the middle of text with duplicates?

How would you use the regex in Notepad++ to format replacing a single character that it finds in every line excepts for the duplicate ones in the certain line further?
test1:_|TEST:-TEST.|
test2:_|TEST:-TEST.|
test3:_|TEST:-TEST.|
As shown in the test code, there are two colons; I'm trying to replace the first colon with each line to a ; and NOT the second one found; the result of me doing the regex should equal to this:
test1;_|TEST:-TEST.|
test2;_|TEST:-TEST.|
test3;_|TEST:-TEST.|
Ctrl+H
Find what: ^.+?\K:
Replace with: ;
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.+? # 1 or more any character but newline, not greedy
\K # forget all we have seen until this position
: # colon
Screen capture (before):
Screen capture (after):
I'm guessing that maybe this expression,
(\w+)\s*(?::)(\s*_\s*\|\s*\w+\s*:\s*-\w+\.\|)
with a replacement of $1;$2 might work.
DEMO 1
Or with less boundaries, this expression:
([^:]+):(.*)
with the same replace.
DEMO 2
It's done like this
Find (?m)^[^:\r\n]*\K:
Replace ;
https://regex101.com/r/rT1vG9/1

\1 not defined in the RE

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

Cut lines using Notepad++ Regexp replace

I need to cut lines that have 6 or more characters, hyphen, then other characters or symbols. Hyphen and rest of line should be removed. Source text:
0402CS-2
0402CS-3
0402
7812-C
0603CS-1
0603CS-2
0603CS-3
As a result, I need this:
0402CS
0402CS
0402
7812-C
0603CS
0603CS
0603CS
To do that, I use Notepad++ regexp replace feature. Find pattern: ^([^\-]{6,})\-.+$ Replace pattern: \1
But there is no option "multiline", so, symbols "^" and "$" doesn't match ONLY beginning and end of the line and actually I have result:
0402CS
0402CS
0402
7812 <-- that's wrong!
0603CS
0603CS
0603CS
Please advice me how to fix find pattern? Or, maybe there is other handful and powerful free text editor that can do that?
^([^\n\-]{6,})\-.+$
^^
Just use \n as due to [^-] the regex can traverse to line below as use that line to make a match.
See demo.
https://regex101.com/r/BHO93c/1
for the input
0402
7812-C the regex matches both lines as 1 line and makes a match.
See demo if 0402 is not there.
https://regex101.com/r/BHO93c/2
That happens because the [^-] character class also matches a newline.
Add \n to it:
^([^\n-]{6,})-.+$
See the regex online demo (note the m multiline modifier (making ^ match the start of the line, and $ - the end of the line) and g modifier (enabling search for multiple occurrences) that is ON by default in Notepad++).
Note that escaping the hyphen is not necessary inside a character class when it is at the start/end of the class, and you never need to escape the hyphen outside the character class.

Add to end of line that contains a specific word and starts with x

I would like to add some custom text to the end of all lines in my document opened in Notepad++ that start with 10 and contain a specific word (for example "frog").
So far, I managed to solve the first part.
Search: ^(10)$
Replace: \1;Batteries (to add ;Batteries to the end of the line)
What I need now is to edit this regex pattern to recognize only those lines that also contain a specific word.
For example:
Before: 1050;There is this frog in the lake
After: 1050;There is this frog in the lake;Batteries
You can use the regex to match your wanted lines:
(^(10).*?(frog).*)
the .*? is a lazy quantifier to get the minimum until frog
and replace by :
$1;Battery
Hope it helps,
You should allow any characters between the number and the end of line:
^10.*frog.*
And replacement will be $0;Batteries. You do not even need a $ anchor as .* matches till the end of a line since . matches any character but a line break char.
NOTE: There is no need to wrap the whole pattern with capturing parentheses, the $0 placeholder refers to the whole match value.
More details:
^ - start of a line
10 - a literal 10 text
.* - zero or more chars other than line break chars as many as possible
frog - a literal string
.* - zero or more chars other than line break chars as many as possible
try this
find with: (^(10).*(frog).*)
replace with: $1;Battery
Use ^(10.*frog.*)$ as regex. Replace it with something like $1;Batteries

Notepad++ Replace all with an exception

I am attempting to edit a csv file, below is a sample line from this file.
|MIGRATE|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
The beginning of the line |MIGRATE| needs to be modified without changing the second MIGRATE so the line would read
|MIGRATE|;|MIG_IN|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
There are 7700 or so lines so if I am forced to do this manually I will probably cry a little.
Thanks in advance!
Just replace all the ones you want not changed with another word temporarily, then replace the rest with what you want. I'm not sure what you're asking here, but from what I can guess this might help.
It seems like you could just search for Just search for:
^\|MIGRATE\|
And replace with:
|MIGRATE|;|MIG_IN|
Make sure you've checked 'Regular expression' in the 'Search Mode' options.
Explanation: The ^ is a begin anchor; it will match the beginning of the line, ensuring that it does not match the second |MIGRATE|. The \ characters are required to escape the | characters since they normally have special meaning in regular expressions, and you want to match a literal |.
You can use beginning of line anchors:
Find:
^(\|MIGRATE\|)
Replace with:
$1;|MIG_IN|
regex101 demo
Just make sure that you are using the regular expression mode of the Search&Replace.
If you want to be a bit fancier, you can use a positive lookbehind:
Find:
(?<=^\|MIGRATE\|)
Replace with:
;|MIG_IN|
^ Will match only at the beginning of a line.
( ... ) is called a capture group, and will save the contents of the match in variable you can use (in the first regex, I accessed the variable using $1 in the replace. The first capture gets stored to $1, the second to $2, etc.)
| is a special character meaning 'or' in regex (to match a character or group of characters or another, e.g. a|b matches a or b. As such, you need to escape it with a backslash to make a regex match a literal |.
In my second regex, I used (?<= ... ) which is called a positive lookbehind. It makes sure that the part to be matched has what's inside before it. For instance, (?<=a)b matches a b only if it has an a before it. So that the b in ab matches but not in bb.
The website I linked also explains the details of the regex and you can try out some regex yourself!