How to remove via Regex text around - regex

How we can remove in Notepad++ with regular expressions the not needed text around a specific string? The string with numbers don't has to be removed. The numbers (string) we need is surrounded always by "onRemoveVariable([0-9]*)".
Source:
<table>
<tr><td style="css">
del
edit
</td></tr>
<tr><td style="css">
del
edit
</td></tr>
Result:
12354
1231584
Does anybody has an idea?
Beste regards
Mario

You could use this regex to delete everything except the numbers between the onRemoveVariable parts:
^.*?onRemoveVariable\((\d+)\).*$|.*
This will attempt to get the numbers first, and if not found, match the whole line.
Replacement string:
$1
If the number was matched, the replacement string will thus put only the number back. If not, then $1 will be null and the result will be an empty line.
regex101 demo
If you now want to remove the multiple blank lines, you can use something like:
\R+
And replace with:
\r\n
Then remove manually any remaining empty lines (there can be at most 2 with this replace, one at the beginning and one at the end). \R matches any line break and \R+ thus matches multiple line breaks. The above thus replaces multiple line breaks with single line breaks.
^ # Beginning of line
.*? # Match everything until...
onRemoveVariable\( # Literal string oneRemoveVariable( is matched
(\d+) # Store the digits
\) # Match literal )
.* # Match any remaining characters
$ # End of line
| # OR if no 'onRemoveVariable(` is found with digits and )...
.* # Match the whole line

You need find all digits \d+ with onRemoveVariable( before it and ) after it.
Use lookahead and lookbehind assertions.
(?<=onRemoveVariable\()(\d+)(?=\))

You can use this regex to match just numbers you want :
/onRemoveVariable\((\d+)\)/g
DEMO (Look at the match information on the right panel)
Hope it helps.

Related

Find lines without specified string and remove empty lines too

So, I know from this question how to find all the lines that don't contain a specific string. But it leaves a lot of empty newlines when I use it, for example, in a text editor substitution (Notepad++, Sublime, etc).
Is there a way to also remove the empty lines left behind by the substitution in the same regex or, as it's mentioned on the accepted answer, "this is not something regex ... should do"?
Example, based on the example from that question:
Input:
aahoho
bbhihi
cchaha
sshede
ddhudu
wwhada
hede
eehidi
Desired output:
sshede
hede
[edit-1]
Let's try this again: what I want is a way to use regex replace to remove everything that does not contain hede on the text editor. If I try .*hede.* it will find all hede:
But it will not remove. On a short file, this is easy to do manually, but the idea here is to replace on a larger file, with over 1000+ lines, but that would contain anywhere between 20-50 lines with the desired string.
If I use ^((?!hede).)*$ and replace it with nothing, I end up with empty lines:
I thought it was a simple question, for people with a better understanding of regex than me: can a single regex replace also remove those empty lines left behind?
An alternative try
Find what: ^(?!.*hede).*\s?
Replace with: nothing
Explanation:
^ # start of a line
(?!) # a Negative Lookahead
. # matches any character (except for line terminators)
* # matches the previous token between zero and unlimited times,
hede # matches the characters hede literally
\s # matches any whitespace character (equivalent to [\r\n\t\f\v ])
? # matches the previous token between zero and one times,
Using Notepad++.
Ctrl+H
Find what: ^((?!hede).)*(?:\R|\z)
Replace with: LEAVE EMPTY
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
((?!hede).)* # tempered greedy token, make sure we haven't hede in the line
(?:\R|\z) # non capture group, any kind of line break OR end of file
Screenshot (before):
Screenshot (after):
Have you tried:
.*hede.*
I don't know why you are doing an inverse search for this.
You can use sed like:
sed -e '/.*hede.*/!d' input.txt

\1 not defined in the RE

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

Match line only if next line is an empty Line

I'm very new to regex, what I'm trying to do is to match a line only if the next line is an empty line.
For example:
This is some text
( empty line )
This is some text
This is some text
This is some text
( empty line )
This is some text
( empty line )
In the previous example, I would like to be able to select only line 1,5,7.
Is this possible with regex in notepad++?
Thanks in advance.
You can use this regex,
(.*)\n\s*\n
and replace with
\1
Working Demo
It uses a concept of group capture, so here you can use \1 to use captured group, that is line before newline
You could try the below positive lookahead based regex.
^.*?\S.*(?=\n[ \t]*$)
\S matches any non-space character. So .*?\S.* matches the line which has at-least one non-space character and the following (?=\n[ \t]*$) asserts that the match must be followed by a newline character and then followed by zero or more space or tab characters.
OR
^.*?\S.*(?=\n\n)
If you mean empty line as line which don't have any single space or tab characters, then you could use the above regex. (?=\n\n) asserts that the match must be followed by a blank line.
DEMO 1
DEMO 2
This should do the trick:
/(.)*\n\n/
If you're looking for an easy way to test / verify regex rubular is pretty good.

Add to end of line that contains a specific word and starts with x

I would like to add some custom text to the end of all lines in my document opened in Notepad++ that start with 10 and contain a specific word (for example "frog").
So far, I managed to solve the first part.
Search: ^(10)$
Replace: \1;Batteries (to add ;Batteries to the end of the line)
What I need now is to edit this regex pattern to recognize only those lines that also contain a specific word.
For example:
Before: 1050;There is this frog in the lake
After: 1050;There is this frog in the lake;Batteries
You can use the regex to match your wanted lines:
(^(10).*?(frog).*)
the .*? is a lazy quantifier to get the minimum until frog
and replace by :
$1;Battery
Hope it helps,
You should allow any characters between the number and the end of line:
^10.*frog.*
And replacement will be $0;Batteries. You do not even need a $ anchor as .* matches till the end of a line since . matches any character but a line break char.
NOTE: There is no need to wrap the whole pattern with capturing parentheses, the $0 placeholder refers to the whole match value.
More details:
^ - start of a line
10 - a literal 10 text
.* - zero or more chars other than line break chars as many as possible
frog - a literal string
.* - zero or more chars other than line break chars as many as possible
try this
find with: (^(10).*(frog).*)
replace with: $1;Battery
Use ^(10.*frog.*)$ as regex. Replace it with something like $1;Batteries

Multiline selection of blocks with ID at the end of each block with regular expression

I have regular expression:
BEGIN\s+\[([\s\S]*?)END\s+ID=(.*)\]
which select multiline text and ID from text below. I would like to select only IDs with prefix X_, but if I change ID=(.*) to ID=(X_.*) begin is selected from second pair not from third as I need. Could someone help me to get correct expression please?
text example:
BEGIN [
text a
END ID=X_1]
BEGIN [
text b
text c
END ID=Y_1]
text aaa
text bbb
BEGIN [
text d
text e
END ID=X_2]
text xxx
BEGIN [
text bbb
END ID=X_3]
It isn't the .* that's gobbling everything up as people keep saying, it's the [\s\S]*?. .* can't do it because (as the OP said) the dot doesn't match newlines.
When the END\s+ID=(X_.*)\] part of your regex fails to match the last line of the second block, you're expecting it to abandon that block and start over with the third one. That's what it have to do to make the shortest match.
In reality, it backtracks to the beginning of the line and lets [\s\S]*? consume it instead. And it keeps on consuming until it finds a place where END\s+ID=(X_.*)\] can match, which happens to be the last line of the third block.
The following regex avoids that problem by matching line by line, checking each one to see if it starts with END. This effectively confines the match to one block at a time.
(?m)^BEGIN\s+\[[\r\n]+((?:(?!END).*[\r\n]+)*)END\s+ID=(X_.*)\]
Note that I used ^ to anchor each match to the beginning of a line, so I used (?m) to turn on multiline mode. But I did not--and you should not--turn on single-line/DOTALL mode.
Assuming there aren't any newlines inside a block and the BEGIN/END statements are the first non-space of their line, I'd write the regex like this (Perl notation; change the delimiters and remove comments, whitespaces and the /x modifier if you use a different engine)
m{
\n \s* BEGIN \s+ \[ # match the beginning
( (?!\n\s*\n) .)*? # match anything that isn't an empty line
# checking with a negative look-ahead (?!PATTERN)
\n \s* END \s+ ID=X_[^\]]* \] # the ID may not contain "]"
}sx # /x: use extended syntax, /s: "." matches newlines
If the content may be anything, it might be best to create a list of all blocks, and then grep through them. This regex matches any block:
m{ (
BEGIN \s+ \[
.*? # non-greedy matching is important here
END \s+ ID=[^\]]* \] # greedy matching is safe here
) }xs
(add newlines if wanted)
Then only keep those matches that match this regex:
/ID = X_[^\]]* \] $/x # anchor at end of line
If we don't do this, backtracking may prevent a correct match ([\s\S]*? can contain END ID=X_). Your regex would put anything inside the blocks until it sees a X_.*.
So using BEGIN\s+\[([/s/S]*?)END\s+ID=(.*?)\] — note the extra question mark — one match would be:
BEGIN [
text b
text c
END ID=Y_1]
text aaa
text bbb
BEGIN [
text d
text e
END ID=X_2]
…instead of failing at the Y_. A greedy match (your unchanged regex) should result in the whole file being matched: Your (.*) eats up all characters (until the end of file) and then goes back until it finds a ].
EDIT:
Should you be using perls regex engine, we can use the (*FAIL) verb:
/BEGIN\s+\[(.*?)END\s+ID=(X_[^\]]*|(*FAIL))\]/s
"Either have an ID starting with X_ or the match fails". However, this does not solve the problem with END ID=X_1]-like statements inside your data.
Change your .* to a [^\]]* (i.e. match non-]s), so that your matches can't spill over past an END block, giving you something like BEGIN\s+\[([^\]]*?)END\s+ID=(X_[^\]]*)\]