Regex: how to remove the last, empty line? - regex

I want to remove the very last, empty line from a file using regex search-replace. Matching a new line with an end-of-line marker:
\n$
seem to be a step in a good direction, but it simply matches all empty lines (new lines character followed by an empty line, to be precise):
I'm using Sublime on Windows, if the line ending characters convention and regex engine does matter.

You can use \s*\Z to select all whiltespaces including newlines and \Z marks the end of input and replace it with empty string.
This will indeed get rid of all the newlines at the end of text (one or more) even when those newlines may contain spaces (not easily visible), which might be helpful, because in general we want to get rid of extra useless lines at the end of text in file.
Just in case if you want to get rid of ONLY ONE line from end of file, you can use \n\Z instead of \s*\Z.
Please check following screenshots demonstrating same.
Before replace,
After replace,

The following regex should help you achieve it
\n\s*$(?!\n)
It begins at line 6, and matches everything at line 7 and deletes it.
Basically it searches for the line that is empty and doesn't have a carriage return at the end
Demo 1
Look close, you'll see that line 7 has disappeared in the replacement
Demo 2 (in Visual Studio Code)
Before
After

Related

Bug in Notepad++ / BOOST or bug in my regular expression?

I have a file which is structured like this:
Line
foo Änderbar: PM baz
Line
Line
foo Änderbar: OM baz
Line
Line
foo Änderbar: ++ baz
Line
Line
foo Änderbar: -- baz
Line
So the file consists of "blocks" which are separated by a newline (I have converted the file to Unix line endings). Each block can have an arbitrary number of lines. Each line of a block contains at least one character which is not a newline, and is finished by a newline character. The lines which separate the blocks consist of exactly one newline character.
In each block, there is exactly one line in the following format:
at least one character which is not newline, followed by
the literal string 'Änderbar: ', followed by
exactly one of the literal strings '++', '--', 'OM', 'PM', followed by
at least one character which is not newline, followed by
the line-terminating newline character
There is always at least one other non-empty line in the same block above this special line and one other non-empty line below this special line.
I need an effective method to find (and thereby select) all blocks where the literal after Änderbar: is -- (find / select one block after another, each one after hitting Find Next again, i.e. not selecting all of those blocks at the same time).
Normally, I have fun solving such problems with Notepad++. However, in that case, it seems that I either get more and more stupid as I get older, or that there is a bug in Notepad++'s regex handling engine.
Notepad++ uses BOOST (and supports PCRE expressions via BOOST). Since this is in wide use, I consider that problem important enough to post it here, just in case that BOOST really is the reason for the misbehavior.
Having said this: I loaded that file into Notepad++, fired up the Search and Replace dialog, ticked . matches newline, ticked Regular Expression and entered the following regex in the Find What: textbox:
\n([^\n]+\n)+[^\n]+(Änderbar\:\ --[^\n]+\n)([^\n]+\n)+
I was quite surprised that this made Notepad++ behave weirdly: When the cursor was placed in the empty line immediately before a block with Änderbar: --, hitting Find Next found / selected that block as expected. But when the cursor was at another place, hitting Find Next made Notepad++ find / select the whole rest of the file, i.e. all blocks below the cursor position.
I then have tested if it would find the blocks having ++ after Änderbar:, i.e. I changed my regex to
\n([^\n]+\n)+[^\n]+(Änderbar\:\ \+\+[^\n]+\n)([^\n]+\n)+
Guess what: This was working reliably in each situation. The same is true for the last both:
\n([^\n]+\n)+[^\n]+(Änderbar\:\ PM[^\n]+\n)([^\n]+\n)+
\n([^\n]+\n)+[^\n]+(Änderbar\:\ OM[^\n]+\n)([^\n]+\n)+
So Notepad++ / PCRE seems to have a problem with the correct interpretation of - under certain circumstances, or I have a subtle bug in my regex which only triggers when I am searching for -- (instead of ++, OM or PM) at the respective place.
Please note that I already have tried to leave away the \ in front of the space character (which actually could only make the situation worse, but I've tried just in case) and that I also have tried to use \-\- instead of -- (although the latter should be fine). That did not alter the (mis-)behavior in any way.
So what is the problem here? Is there a bug in my regex, or is there a bug in Notepad++?
UPDATE
I have stripped down the actual file in question and have uploaded it to https://pastebin.com/w62E57U5. To reproduce the problem, please do the following:
Download the file from the link above and save it somewhere on your HDD (do not copy the text directly into Notepad++).
Load the file into Notepad++. The cursor now is in the topmost line, and nothing is selected.
This is essential: Click Edit -> EOL Conversion -> Unix (LF).
Verify that the cursor is still in the topmost line (which is empty) and that nothing is selected.
Open the Find dialog and choose the settings and enter the search string as described above.
Click "Find Next".
Note that now the complete text is found / selected.
Keeping the Find window open, delete the third line of the file (it reads "Funktionspaket(e): ML"). Do not just empty that line, but really delete it so that no empty line remains between the line before and the line after.
Again, place the cursor in the topmost line (which is still empty) and make sure nothing is selected.
Click "Find Next".
Note that the regular expression now works as expected.
Obviously, somebody is trying to make a fool of me, right?
I think the key is: you need to begin your regex with ^ (beginning of line).
Your original regex becomes:
^\n([^\n]+\n)+[^\n]+(Änderbar\:\ --[^\n]+\n)([^\n]+\n)+
But you can simplify it with:
^\R(?:.+\R)+.+Änderbar: --.+\R(?:.+(?:\R|\z))+
Note: tick . matches newline
Where:
\R matches any kind of linebreak, no needs to change the EOL.
\z matches the end of file, if you don't use it, you can't match the last line of the file if there're no linebreak.
(?:...) is a non capture group, much more efficient (if you don't need to capture, of course)
Both works fine with your 2 sample files.
It's not a bug. You're just forgetting something very important - with Windows line endings, your lines have a \r before the \n, so the \n([^\n]+\n)+ part of your RegEx will also match your blank lines which is why clicking "Find Next" matches everything from the cursor position instead of from the start of the block.
Go to Edit > EOL Conversion > Unix (LF) and you'll see that it works now. If you want to support Windows and Unix line endings you'll have to change every [^\n] to [^\r\n] and every \n to \r?\n.

Deleting lines with specific words in multiple files in Notepad++

I'm trying to removing a specific line from many files I'm working on with Notepad++.
Upon searching, I found Notepad++ Remove line with specific word in multiple files within a directory but somehow the regex provided (^.*(?:YOURSTRINGHERE).*\r\n$) from the answers doesn't work for me (screenshot: https://cdn.discordapp.com/attachments/311547963883388938/407737068475908096/unknown.png).
I read on some other questions/answers that certain regex doesn't work in newer/older Notepad++ versions. I was using Notepad++ 5.x.x then updated to the latest 7.5.4, but neither worked with the regex provided in the question above.
At the moment I can work around it by replacing that line with nothing, twice (because there are only 2 variants that I need to remove from those files) but that leaves an empty line at the end of the files. So I have to do another step further to remove that empty line.
I'm hoping someone can offer helps that allow me to remove that line and leave no empty line/space behind.
The regex you attempt to use will only match your line, if it is followed by an empty line and Windows linebreaks (CR LF) are used. This is due to \r\n$ which matches a linebreak sequence followed by the end of the line.
Instead you might want to use
^.*(?:YOURSTRINGHERE).*\R?
To match the line containing your string and optionally a following line break sequence to remove the line instead of emptying it out. This will leave you with a trailing newline, if your word is contained in the last line of a file. You can use
(\R)?.*(?:YOURSTRINGHERE).*(?(1)|\R)
To avoid this. It uses a conditional to either match the previous linebreak, or the following if there is none.

Regex to remove the first 2 lines of a text file

I am trying to delete only the first 2 lines of a text file.
I tried using \A.*, but this gets the first line and deletes the rest.
Is there a way to do the inverse?
It is maybe not the most convenient way, but it is possible with Regex:
^.*\n.*\n([\s\S]*)$
With default settings (neither single-line nor multi-line modifiers) the '.' captures everything, except newline. Therfore, .*\n captures one line, including the new line character. Repeat it twice, and we are at the beginning of the third line. Now capture all characters, including the new line character ([\s\S] is a nice workaround for this behavior) until the end of the file $.
Then substitute by the first capturing group
\1
and you have everything but the first 2 lines.
The details depend on your regex engine, how you give the substitute string. And depending on the platform or the used new line character of the file, you might need to exchange the \n with \r\n or \r or the one that matches it all (\r\n?|\n).
Here is a working Demo.

How to read this command to remove all blanks at the end of a line

I happened across this page full of super useful and rather cryptic vim tips at http://rayninfo.co.uk/vimtips.html. I've tried a few of these and I understand what is happening enough to be able to parse it correctly in my head so that I can possibly recreate it later. One I'm having a hard time getting my head wrapped around though are the following two commands to remove all spaces from the end of every line
:%s= *$== : delete end of line blanks
:%s= \+$== : Same thing
I'm interpreting %s as string replacement on every line in the file, but after that I am getting lost in what looks like some gnarly variation of :s and regex. I'm used to seeing and using :s/regex/replacement. But the above is super confusing.
What do those above commands mean in english, step by step?
The regex delimiters don't have to be slashes, they can be other characters as well. This is handy if your search or replacement strings contain slashes. In this case I don't know why they use equal signs instead of slashes, but you can pretend that the equals are slashes:
:%s/ *$//
:%s/ \+$//
Does that make sense? The first one searches for a space followed by zero or more spaces, and the second one searches for one or more spaces. Each one is anchored at the end of the line with $. And then the replacement string is empty, so the spaces are deleted.
I understand your confusion, actually. If you look at :help :s you have to scroll down a few pages before you find this note:
*E146*
Instead of the '/' which surrounds the pattern and replacement string, you
can use any other character, but not an alphanumeric character, '\', '"' or
'|'. This is useful if you want to include a '/' in the search pattern or
replacement string. Example:
:s+/+//+
I do not know vim syntax, but it looks to me like these are sed-style substitution operators. In sed, the / (in s/REGEX/REPLACEMENT/) can be uniformly replaced with any other single character. Here it appears to be =. So if you mentally replace = with /, you'll get
:%s/ *$//
:%s/ \+$//
which should make more sense to you.

Regex: remove lines not starting with a digit

I have been fighting this problem with the help of a RegEx cheat sheet, trying to figure out how to do this, but I give up... I have this lengthy file open in Notepad++ and would like to remove all lines that do not start with a digit (0..9). I would use the Find/Replace functionality of N++. I am only mentioning this as I am not sure what Regex implementation is N++ using... Thank you
Example. From the following text:
1hello
foo
2world
bar
3!
I would like to extract
1hello
2world
3!
not:
1hello
2world
3!
by doing a find/replace on a regular expression.
You can clear up those line with ^[^0-9].* but it will leave blank lines.
Notepad++ use scintilla, and also using its regex engine to match those.
\r and \n are never matched because in
Scintilla, regular expression searches
are made line per line (stripped of
end-of-line chars).
http://www.scintilla.org/SciTERegEx.html
To clear up those blank lines, only way is choose extended mode, and replace \n\n to \n, If you are in windows mode change \r\n\r\n to \r\n
[^0-9] is a regular expression that matches pretty much anything, except digits. If you say ^[^0-9] you "anchor" it to the start of the line, in most regular expression systems. If you want to include the rest of the line, use ^[^0-9].+.
^[^\d].* marks a whole line whose first character is not a digit. Check if there are really no whitespaces in front of the digits. Otherwise you'd have to use a different expression.
UPDATE:
You will have to do ot in two steps. First empty the lines that do not start with a digit. Then remove the empty lines in extended mode.
One could also use the technique of bookmarking in Notepad++. I started benefiting from this feature (long time present but only more recently made somewhat more visible in the UI) not very long ago.
Simply bring up the find dialogue, type regex for lines not starting with digit ^\D.*$ and select Mark All. This will place blue circles, like marbles, in the left gutter - these are line bookmarks. Then just select from main menu Search -> Bookmark -> Remove bookmarked lines.
Bookmarks are cool, you could extract these lines by simply selecting to copy bookmarked lines, opening new document and pasting lines there. I sometimes use this technique when reviewing log files.
I'm not sure what you are asking. but the reg exp for finding the lines with a digit at the beginning would be
^\d.*
you can remove all the lines that match the above or alternatly keep all the lines that match this expression:
^[^\d].*