\1 not defined in the RE - regex

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")

Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

Related

Replace Certain Line Breaks with Equivalent of Pressing delete key on Keyboard NotePad++ Regex

Im using Notepad++ Find and replace and I have regex that looks for [^|]\r which will find the end of the line that starts with 8778.
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBase
See Also 15990TT|
I want to basically merge that line with the one below it, so it becomes this:
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBase See Also 15990TT|
Ive tried the replace being a blank space, but its grabbing the last character on that line (an e in this case) and replacing that with a space, so its making it
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBas
See Also 15990TT|
Is there any way to make it essentially merge the two lines?
\r only matches a carriage return symbol, to match a line break, you need \R that matches any line break sequence.
To keep a part of a pattern after replacement, capture that part with parentheses, and then use a backreference to that group.
So you may use
([^|\r])\R
Replace with $1. Or with $1 if you need to append a space.
Details
([^|\r]) - Capturing group 1 ($1 is the backreference that refers to the group value from the replacement pattern): any char other than | and CR
\R - any line break char sequence, LF, CR or CRLF.
See the regex demo and the Notepad++ demo with settings:
The issue is you're using [^|] to match anything that's not a pipe character before the carriage return, which, on replacement, will remove that character (hence why you're losing an e).
If it's imperative that you match only carriage returns that follow non-pipe characters, capture the preceding character ([^|])\r$ and then put it back in the replacement using $1.
You're also missing a \n in your regex, which is why the replacement isn't concatenating the two lines. So your search should be ([^|])\r\n$ and your replace should be $1.
Find
(\r\n)+
For "Replace" - don't put anything in (not even a space)

Find lines without specified string and remove empty lines too

So, I know from this question how to find all the lines that don't contain a specific string. But it leaves a lot of empty newlines when I use it, for example, in a text editor substitution (Notepad++, Sublime, etc).
Is there a way to also remove the empty lines left behind by the substitution in the same regex or, as it's mentioned on the accepted answer, "this is not something regex ... should do"?
Example, based on the example from that question:
Input:
aahoho
bbhihi
cchaha
sshede
ddhudu
wwhada
hede
eehidi
Desired output:
sshede
hede
[edit-1]
Let's try this again: what I want is a way to use regex replace to remove everything that does not contain hede on the text editor. If I try .*hede.* it will find all hede:
But it will not remove. On a short file, this is easy to do manually, but the idea here is to replace on a larger file, with over 1000+ lines, but that would contain anywhere between 20-50 lines with the desired string.
If I use ^((?!hede).)*$ and replace it with nothing, I end up with empty lines:
I thought it was a simple question, for people with a better understanding of regex than me: can a single regex replace also remove those empty lines left behind?
An alternative try
Find what: ^(?!.*hede).*\s?
Replace with: nothing
Explanation:
^ # start of a line
(?!) # a Negative Lookahead
. # matches any character (except for line terminators)
* # matches the previous token between zero and unlimited times,
hede # matches the characters hede literally
\s # matches any whitespace character (equivalent to [\r\n\t\f\v ])
? # matches the previous token between zero and one times,
Using Notepad++.
Ctrl+H
Find what: ^((?!hede).)*(?:\R|\z)
Replace with: LEAVE EMPTY
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
((?!hede).)* # tempered greedy token, make sure we haven't hede in the line
(?:\R|\z) # non capture group, any kind of line break OR end of file
Screenshot (before):
Screenshot (after):
Have you tried:
.*hede.*
I don't know why you are doing an inverse search for this.
You can use sed like:
sed -e '/.*hede.*/!d' input.txt

Perl: How to substitute the content after pattern CLOSED

So I cant use $' variable
But i need to find the pattern that in a file that starts with the string “by: ” followed by any characters , then replace whatever characters comes after “by: ” with an existing string $foo
im using $^I and a while loop since i need to update multiple fields in a file.
I was thinking something along the lines of [s///]
s/(by\:[a-z]+)/$foo/i
I need help. Yes this is an assignment question but im 5 hours and ive lost many brain cells in the process
Some problems with your substitution:
You say you want to match by: (space after colon), but your regex will never match the space.
The pattern [a-z]+ means to match one or more occurrences of letters a to z. But you said you want to match "any characters". That might be zero characters, and it might contain non-letters.
You've replaced the match with $foo, but have lost by:. The entire matched string is replaced with the replacement.
No need to escape : in your pattern.
You're capturing the entire match in parentheses, but not using that anywhere.
I'm assuming you're processing the file line-by line. You want "starts with the string by: followed by any characters". This is the regex:
/^by: .*/
^ matches beginning of line. Then by: matches exactly those characters. . matches any character except for a newline, and * means zero-or more of the preceding item. So .* matches all the rest of the characters on the line.
"replace whatever characters that come after by: with an existing string $foo. I assume you mean the contents of the variable $foo and not the literal characters $foo. This is:
s/^by: .*/by: $foo/;
Since we matched by:, I repeated it in the replacement string because you want to preserve it. $foo will be interpolated in the replacement string.
Another way to write this would be:
s/^(by: ).*/$1$foo/
Here we've captured the text by: in the first set of parentheses. That text will be available in the $1 variable, so we can interpolate that into the replacement string.

How to remove via Regex text around

How we can remove in Notepad++ with regular expressions the not needed text around a specific string? The string with numbers don't has to be removed. The numbers (string) we need is surrounded always by "onRemoveVariable([0-9]*)".
Source:
<table>
<tr><td style="css">
del
edit
</td></tr>
<tr><td style="css">
del
edit
</td></tr>
Result:
12354
1231584
Does anybody has an idea?
Beste regards
Mario
You could use this regex to delete everything except the numbers between the onRemoveVariable parts:
^.*?onRemoveVariable\((\d+)\).*$|.*
This will attempt to get the numbers first, and if not found, match the whole line.
Replacement string:
$1
If the number was matched, the replacement string will thus put only the number back. If not, then $1 will be null and the result will be an empty line.
regex101 demo
If you now want to remove the multiple blank lines, you can use something like:
\R+
And replace with:
\r\n
Then remove manually any remaining empty lines (there can be at most 2 with this replace, one at the beginning and one at the end). \R matches any line break and \R+ thus matches multiple line breaks. The above thus replaces multiple line breaks with single line breaks.
^ # Beginning of line
.*? # Match everything until...
onRemoveVariable\( # Literal string oneRemoveVariable( is matched
(\d+) # Store the digits
\) # Match literal )
.* # Match any remaining characters
$ # End of line
| # OR if no 'onRemoveVariable(` is found with digits and )...
.* # Match the whole line
You need find all digits \d+ with onRemoveVariable( before it and ) after it.
Use lookahead and lookbehind assertions.
(?<=onRemoveVariable\()(\d+)(?=\))
You can use this regex to match just numbers you want :
/onRemoveVariable\((\d+)\)/g
DEMO (Look at the match information on the right panel)
Hope it helps.

In Notepad++ replace all lines except ones that match to given expression, using regex

I saw here some answers that might help me if I'll combine them together but I can't seem to figure out how to do it properly.
Lets assume we have a following text file:
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaa
[a]
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa[h]
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
[a]
aaaaaaaaaaaaaaaaaaaaaaaa[h]
aaaaaaaaaaaaaaaaaaaaaaaaaaa
Where:
"a" means literally any character (or set of characters), including special symbols, unicode characters etc.
"h" is a fixed latin character
brackets mean brackets
blank line is a blank line
Then:
How do I keep only lines with [h] at the end replacing everything else with bank lines? (means carriage return remains)
How do I keep the same lines but also remove [h]?
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa[h]
aaaaaaaaaaaaaaaaaaaaaaaa[h]
As title says, I guess what I need can be also described as: replace any line except the line that matches to the given expression.
Find what:
^.*$(?<!\[h\])
Replace with nothing. Make sure to uncheck . matches newline.
How does it work?
^ # matches the beginning of a line (after the line break)
.* # matches as many non-line-break characters as possible (an entire line)
$ # matches the end of a line (before the line break)
(?<! # a negative lookbehind, if it's contents match left of the current
# position, it causes the pattern to fail
\[h\] # match [h] literally
) # end of lookbehind
Note that lookarounds are not part of the match. So ^.*$ simply makes sure that you are matching entire lines and not parts of them and neither multiple ones. The lookbehind then assures that the matched line has not ended with [h].
You can then remove the [h] with an additional step:
Find what: \[h\]$
Replace with nothing.
EDIT: Due to the fact that the regex engine traverses the file from beginning to end and the fact that matches can never overlap, you can actually put both patterns into one:
^.*$(?<!\[h\])|\[h\]$
By the time an [h] at the end of the line is removed, the engine will not look at that line again, so you're only left with the lines that used to have an [h] at the end.