why :%s/^$//g is not equal to :g/^$/d in vim? - regex

I want to delete blank lines in the file in gvim for windows (not vim from cygwin).
:g/^$/d # delete all blank lines correctly
:%s/^$//g #can not delete all blank lines
It's a problem how to express the end of line ,
i have checked my file in detail with the command %!xxd ,
i found there is a 0d0a at the end of every line,
when the end of line is expressed by the special character $ ,
does it contain the 0d0a?
it is different to express concept of the end of line betwwen command g and s ?
:%s/^$\n//g #delete all blank lines correctly
It confused me that ^$ will contain the special character \r\n or not,maybe ^$ in s command do not contai the special character \r\n ,but ^$ in g command do contai the special character \r\n.
which position does special character $ point at ? behind the \r\n or before the \r\n.

No, the two commands are doing different things. :g/^$/d says "find empty lines and delete them", while :s/^$//g means "in this (current) line, replace all occurences of nothing-on-a-line with nothing (still on the same line)". The latter, as you notice, does not make sense.
EDIT for your edit:
The ^$ do not contain anything. It literally says "start of line, then immediately end of line", where "line" is detected by Vim. Vim knows that \r\n or \r or \n (depending on file and on which OS it was created, and on Vim's options) mark the end of line, and treats it accordingly. The separator lies between the ^$.
It's like when you say "the rooms in my house" - this (for me at least) does not include walls. When you say ^$\n, you're saying "the empty room, and also the wooden wall next to it". s/^$// is "empty the (already empty) room"; s/^$\n// is "empty the empty room and break down its wall".
In contrast, for g, again it does not say anything about \n. It finds the empty row (not caring about any separators), and then does a command on it; in your case, the command is d: delete a row. It deletes a full row (along with any newlines). For example, if you write :g/DELETEME/d, it would delete any rows that have DELETEME in it, anywhere: it does not care about any newlines in the match part, it just deletes the matched rows.

d means delete the lines that are matched. s is just a substitution. Essentially the second expression means "substitute lines that match ^$ with an empty string," but that does not delete them. You are substituting nothing with nothing. ^ and $ are zero-width assertions and cannot be replaced.

Both #Amadan and #Explosion Pills are right, but I think you really need an explanation of "how vim thinks" about what you are editing.
For one thing, vim is based on vi, which was a front end to the ex editor. Although vim has come a long way, it is still a line-based editor. Do not look for the magic character at the end of the line, because there is none! (When I say that vim has come a long way, I remember when the 'whichwrap' option was added.)
We usually do not bother to make the distinction, but the file on your hard disk is different from the buffer that you edit in vim (:help buffers). The file is a sequence of characters (or bytes). When vim reads the file, it splits them up into lines depending on the 'fileformat' option (short form 'ff'). Since you work on windows, the default is 'ff'=dos, which means that 0d0a (a.k.a. CRLF or \r\n) in the file is used to separate lines. I have 'ff'=unix, so I see only 0a when I filter through xxd. (Before OS X, Mac used 0d, so there were three standards!)
It is a good thing that the magical EOL character is simply not there, because that makes vim portable between systems. Being a line-based editor, vim lets you do all sorts of things like find the first pattern match on each line and do something to it. That is not always what you want, so some people get in the habit of adding /g at the end of every :s command, or even set 'gdefault'.
Coming back to your original question, removing a line from the buffer is very different from removing all the characters in the line. Ex commands (Remember the lineage) act on lines, and :d is the command to delete one. The :s command will change a line; do not be confused by the common usage :%s, which invokes :s on every line in the buffer (:help :range).

Related

Bug in Notepad++ / BOOST or bug in my regular expression?

I have a file which is structured like this:
Line
foo Änderbar: PM baz
Line
Line
foo Änderbar: OM baz
Line
Line
foo Änderbar: ++ baz
Line
Line
foo Änderbar: -- baz
Line
So the file consists of "blocks" which are separated by a newline (I have converted the file to Unix line endings). Each block can have an arbitrary number of lines. Each line of a block contains at least one character which is not a newline, and is finished by a newline character. The lines which separate the blocks consist of exactly one newline character.
In each block, there is exactly one line in the following format:
at least one character which is not newline, followed by
the literal string 'Änderbar: ', followed by
exactly one of the literal strings '++', '--', 'OM', 'PM', followed by
at least one character which is not newline, followed by
the line-terminating newline character
There is always at least one other non-empty line in the same block above this special line and one other non-empty line below this special line.
I need an effective method to find (and thereby select) all blocks where the literal after Änderbar: is -- (find / select one block after another, each one after hitting Find Next again, i.e. not selecting all of those blocks at the same time).
Normally, I have fun solving such problems with Notepad++. However, in that case, it seems that I either get more and more stupid as I get older, or that there is a bug in Notepad++'s regex handling engine.
Notepad++ uses BOOST (and supports PCRE expressions via BOOST). Since this is in wide use, I consider that problem important enough to post it here, just in case that BOOST really is the reason for the misbehavior.
Having said this: I loaded that file into Notepad++, fired up the Search and Replace dialog, ticked . matches newline, ticked Regular Expression and entered the following regex in the Find What: textbox:
\n([^\n]+\n)+[^\n]+(Änderbar\:\ --[^\n]+\n)([^\n]+\n)+
I was quite surprised that this made Notepad++ behave weirdly: When the cursor was placed in the empty line immediately before a block with Änderbar: --, hitting Find Next found / selected that block as expected. But when the cursor was at another place, hitting Find Next made Notepad++ find / select the whole rest of the file, i.e. all blocks below the cursor position.
I then have tested if it would find the blocks having ++ after Änderbar:, i.e. I changed my regex to
\n([^\n]+\n)+[^\n]+(Änderbar\:\ \+\+[^\n]+\n)([^\n]+\n)+
Guess what: This was working reliably in each situation. The same is true for the last both:
\n([^\n]+\n)+[^\n]+(Änderbar\:\ PM[^\n]+\n)([^\n]+\n)+
\n([^\n]+\n)+[^\n]+(Änderbar\:\ OM[^\n]+\n)([^\n]+\n)+
So Notepad++ / PCRE seems to have a problem with the correct interpretation of - under certain circumstances, or I have a subtle bug in my regex which only triggers when I am searching for -- (instead of ++, OM or PM) at the respective place.
Please note that I already have tried to leave away the \ in front of the space character (which actually could only make the situation worse, but I've tried just in case) and that I also have tried to use \-\- instead of -- (although the latter should be fine). That did not alter the (mis-)behavior in any way.
So what is the problem here? Is there a bug in my regex, or is there a bug in Notepad++?
UPDATE
I have stripped down the actual file in question and have uploaded it to https://pastebin.com/w62E57U5. To reproduce the problem, please do the following:
Download the file from the link above and save it somewhere on your HDD (do not copy the text directly into Notepad++).
Load the file into Notepad++. The cursor now is in the topmost line, and nothing is selected.
This is essential: Click Edit -> EOL Conversion -> Unix (LF).
Verify that the cursor is still in the topmost line (which is empty) and that nothing is selected.
Open the Find dialog and choose the settings and enter the search string as described above.
Click "Find Next".
Note that now the complete text is found / selected.
Keeping the Find window open, delete the third line of the file (it reads "Funktionspaket(e): ML"). Do not just empty that line, but really delete it so that no empty line remains between the line before and the line after.
Again, place the cursor in the topmost line (which is still empty) and make sure nothing is selected.
Click "Find Next".
Note that the regular expression now works as expected.
Obviously, somebody is trying to make a fool of me, right?
I think the key is: you need to begin your regex with ^ (beginning of line).
Your original regex becomes:
^\n([^\n]+\n)+[^\n]+(Änderbar\:\ --[^\n]+\n)([^\n]+\n)+
But you can simplify it with:
^\R(?:.+\R)+.+Änderbar: --.+\R(?:.+(?:\R|\z))+
Note: tick . matches newline
Where:
\R matches any kind of linebreak, no needs to change the EOL.
\z matches the end of file, if you don't use it, you can't match the last line of the file if there're no linebreak.
(?:...) is a non capture group, much more efficient (if you don't need to capture, of course)
Both works fine with your 2 sample files.
It's not a bug. You're just forgetting something very important - with Windows line endings, your lines have a \r before the \n, so the \n([^\n]+\n)+ part of your RegEx will also match your blank lines which is why clicking "Find Next" matches everything from the cursor position instead of from the start of the block.
Go to Edit > EOL Conversion > Unix (LF) and you'll see that it works now. If you want to support Windows and Unix line endings you'll have to change every [^\n] to [^\r\n] and every \n to \r?\n.

Why does this particular Vim RegEx string work?

I had spent a while trying to narrow down a way of retrieving only web links from a few thousand lines that ended with either jpg or png.
If I use
%s/\(http.*\(jpg\|png\)\)\=\(.*\|\_s\)/\1/g|%s/\n\=
I can grab links just fine. The some thousands of lines are removed and replaced by only matching links. But if I remove the first \=, like here
%s/\(http.*\(jpg\|png\)\)\(.*\|\_s\)/\1/g|%s/\n\=
nothing in the file is changed or removed, and all the text is highlighted as a match.
If I remove it from the end of the pattern string, it concatenates every match onto a single line. I understand the basic reason for why this happens (being used by itself). That said, I am lost as to why it does not happen the same way when used in this specific case. (Meaning, the links do not get piled onto one line.)
My questions are:
Why do the links remain unchanged in the first example rather than replace the entire file or be removed entirely?
Why does specifying \n as an optional element not remove the nulls when the meaning of \= is "match 0 OR 1"?
Starting from the end of your regexp, with
%s/\n\=
You're substituting in every line 0 or 1 \n with //, hence and since you're not using the g flag, in any line that begins with anything but a \n, there'll be a match of the 0 part and nothing will be substituted with nothing: i.e. the line remains the same. (Led zeppelin quote)
It's equivalent to:
:%s/^\n
If you remove the \=, the first \n actually found in every line will be removed, that's why empty lines and the newlines at the end of your non empty lines get removed.
Now, here:
%s/\(http.*\(jpg\|png\)\)\=\(.*\|\_s\)/\1/g
The \= makes so that any string with 0 or 1 \(http.*\(jpg\|png\)\) patterns followed by anything (since you have \(.*\|\_s\)), will be replaced by the first saved pattern.
Basically, you're matching your whole file and preventing only this pattern: \(http.*\(jpg\|png\)\) from being removed.
When you remove \=, the 0 part of the match drops, and only in the lines that actually have the \(http.*\(jpg\|png\)\) pattern there will be a substitution of the matched pattern with itself from http up to jpg/png with anything after that being removed.
On a side note, if you save a pattern but don't use it in the substitution string, you're losing that pattern anyway.
If you actually only want to keep the http..jpg/png lines and remove the others, you can use the g! or v command:
:v/http.*jpg\|png/d
deletes all the lines that don't have the matched pattern.

Find newline in Visual Studio 2013

I have a C++ source file containing many functions.
I want to find the beginning of every function quickly.
How can I form an expression for )newline{newline?
The newline symbol can be either one of the following:
\n
\r
\n\r
\r\n
Presumably, the same symbol is used all across the file, so instead of a single expression for all options combined, I need a single expression for each option.
I assume that a regular-expression can be used, but I'm not sure how.
Thanks
Barak, before we look at individual options, for all options, this will do it:
\)[\r\n]+{[\r\n]+
The [\r\n] is a character class that allows either of \r or \n. It is quantified with a + which means we are looking for one or more of these characters.
You said you want individual options, so this can be turned to:
\)\r\n{\r\n
\)\r{\r
\)\n{\n
\)\n\r{\n\r (this sequence of newlines is quite surprising)
If you simply want to use the regex search in VS to find the beginning of each function then this should work for you:
\)\r?\n\s*{\r?\n
Although that assumes the { is always on the next line with no white space before the line break.
This would be less strict where white space is concerned, but still expect the { to be on the next line and to be followed by a line break:
\)\s*\r?\n\s*{\s*\r?\n
And this would basically just look for the 2 brackets even if they're on the same line:
\)\s*\r?\n?\s*{
And if you expect there could be several line breaks between the 2 brackets:
\)\s*(\r?\n\s*)*{
Last example should find anything that could resemble the beginning of a method. But not sure how strict you want your search to be.

Delete one and two letter lines in VIM?

I have a data source that contains a bunch of city names, but mixed in are quite a few state abbreviations that shouldn't be there..
Is there a way in VIM to delete each line than contains two characters or less?
Here it is:
:g/^\a\{1,2}$/d
Explanation
delete each line that ... → that calls for a :global command (which defaults to % range, the entire buffer); the executed command is :delete.
two characters or less → in a regular expression, any character is matched by ., to restrict this to 1 or 2 the \{n,m} multi is used. This still needs to be anchored via ^ and $ to the beginning and end of the line, so that additional characters don't make this match. Oh, and if you also want to remove completely empty lines, change this to .\{,2}. See :help /\{ for details.
more robust "characters": . will match any character, i.e. also whitespace. To avoid unwanted matches, it's best to restrict this as much as possible. If your state abbreviations are only alphabetic, you can use the \a atom instead of . The available character classes start in the help at :help /\i.

How to read this command to remove all blanks at the end of a line

I happened across this page full of super useful and rather cryptic vim tips at http://rayninfo.co.uk/vimtips.html. I've tried a few of these and I understand what is happening enough to be able to parse it correctly in my head so that I can possibly recreate it later. One I'm having a hard time getting my head wrapped around though are the following two commands to remove all spaces from the end of every line
:%s= *$== : delete end of line blanks
:%s= \+$== : Same thing
I'm interpreting %s as string replacement on every line in the file, but after that I am getting lost in what looks like some gnarly variation of :s and regex. I'm used to seeing and using :s/regex/replacement. But the above is super confusing.
What do those above commands mean in english, step by step?
The regex delimiters don't have to be slashes, they can be other characters as well. This is handy if your search or replacement strings contain slashes. In this case I don't know why they use equal signs instead of slashes, but you can pretend that the equals are slashes:
:%s/ *$//
:%s/ \+$//
Does that make sense? The first one searches for a space followed by zero or more spaces, and the second one searches for one or more spaces. Each one is anchored at the end of the line with $. And then the replacement string is empty, so the spaces are deleted.
I understand your confusion, actually. If you look at :help :s you have to scroll down a few pages before you find this note:
*E146*
Instead of the '/' which surrounds the pattern and replacement string, you
can use any other character, but not an alphanumeric character, '\', '"' or
'|'. This is useful if you want to include a '/' in the search pattern or
replacement string. Example:
:s+/+//+
I do not know vim syntax, but it looks to me like these are sed-style substitution operators. In sed, the / (in s/REGEX/REPLACEMENT/) can be uniformly replaced with any other single character. Here it appears to be =. So if you mentally replace = with /, you'll get
:%s/ *$//
:%s/ \+$//
which should make more sense to you.