I have a C++ source file containing many functions.
I want to find the beginning of every function quickly.
How can I form an expression for )newline{newline?
The newline symbol can be either one of the following:
\n
\r
\n\r
\r\n
Presumably, the same symbol is used all across the file, so instead of a single expression for all options combined, I need a single expression for each option.
I assume that a regular-expression can be used, but I'm not sure how.
Thanks
Barak, before we look at individual options, for all options, this will do it:
\)[\r\n]+{[\r\n]+
The [\r\n] is a character class that allows either of \r or \n. It is quantified with a + which means we are looking for one or more of these characters.
You said you want individual options, so this can be turned to:
\)\r\n{\r\n
\)\r{\r
\)\n{\n
\)\n\r{\n\r (this sequence of newlines is quite surprising)
If you simply want to use the regex search in VS to find the beginning of each function then this should work for you:
\)\r?\n\s*{\r?\n
Although that assumes the { is always on the next line with no white space before the line break.
This would be less strict where white space is concerned, but still expect the { to be on the next line and to be followed by a line break:
\)\s*\r?\n\s*{\s*\r?\n
And this would basically just look for the 2 brackets even if they're on the same line:
\)\s*\r?\n?\s*{
And if you expect there could be several line breaks between the 2 brackets:
\)\s*(\r?\n\s*)*{
Last example should find anything that could resemble the beginning of a method. But not sure how strict you want your search to be.
Related
How can I match one line of text with a regex and follow it up with a line of dashes exactly as many as characters in the initial match to achieve text-only underlining. I intend to use this with the search and replace function (likely in the scope of a macro) inside an editor. Probably, but not necessarily, Visual Studio Code.
This is a heading
should turn into
This is a heading
-----------------
I believe I have read an example for that years ago but can't find it; neither do I seem to be able to formulate a search query to get anything useful out of Google (including variations of the question's title). If you are I'd be interested in that, too.
The best I can come up with is this:
^(.)(?=(.*\n?))|.
Substitution
$1$2-
syntax
note
^(.)
match the first character of a line, capture it in group 1
(?=(.*\n?))
then look ahead for the rest of this line and capture it in group 2, including a line break if there's any
|.
or a normal character
But the text must has a line break after it, or the underline only stays on the same line.
Not sure if it is any useful but here are the test cases.
I have a file which is structured like this:
Line
foo Änderbar: PM baz
Line
Line
foo Änderbar: OM baz
Line
Line
foo Änderbar: ++ baz
Line
Line
foo Änderbar: -- baz
Line
So the file consists of "blocks" which are separated by a newline (I have converted the file to Unix line endings). Each block can have an arbitrary number of lines. Each line of a block contains at least one character which is not a newline, and is finished by a newline character. The lines which separate the blocks consist of exactly one newline character.
In each block, there is exactly one line in the following format:
at least one character which is not newline, followed by
the literal string 'Änderbar: ', followed by
exactly one of the literal strings '++', '--', 'OM', 'PM', followed by
at least one character which is not newline, followed by
the line-terminating newline character
There is always at least one other non-empty line in the same block above this special line and one other non-empty line below this special line.
I need an effective method to find (and thereby select) all blocks where the literal after Änderbar: is -- (find / select one block after another, each one after hitting Find Next again, i.e. not selecting all of those blocks at the same time).
Normally, I have fun solving such problems with Notepad++. However, in that case, it seems that I either get more and more stupid as I get older, or that there is a bug in Notepad++'s regex handling engine.
Notepad++ uses BOOST (and supports PCRE expressions via BOOST). Since this is in wide use, I consider that problem important enough to post it here, just in case that BOOST really is the reason for the misbehavior.
Having said this: I loaded that file into Notepad++, fired up the Search and Replace dialog, ticked . matches newline, ticked Regular Expression and entered the following regex in the Find What: textbox:
\n([^\n]+\n)+[^\n]+(Änderbar\:\ --[^\n]+\n)([^\n]+\n)+
I was quite surprised that this made Notepad++ behave weirdly: When the cursor was placed in the empty line immediately before a block with Änderbar: --, hitting Find Next found / selected that block as expected. But when the cursor was at another place, hitting Find Next made Notepad++ find / select the whole rest of the file, i.e. all blocks below the cursor position.
I then have tested if it would find the blocks having ++ after Änderbar:, i.e. I changed my regex to
\n([^\n]+\n)+[^\n]+(Änderbar\:\ \+\+[^\n]+\n)([^\n]+\n)+
Guess what: This was working reliably in each situation. The same is true for the last both:
\n([^\n]+\n)+[^\n]+(Änderbar\:\ PM[^\n]+\n)([^\n]+\n)+
\n([^\n]+\n)+[^\n]+(Änderbar\:\ OM[^\n]+\n)([^\n]+\n)+
So Notepad++ / PCRE seems to have a problem with the correct interpretation of - under certain circumstances, or I have a subtle bug in my regex which only triggers when I am searching for -- (instead of ++, OM or PM) at the respective place.
Please note that I already have tried to leave away the \ in front of the space character (which actually could only make the situation worse, but I've tried just in case) and that I also have tried to use \-\- instead of -- (although the latter should be fine). That did not alter the (mis-)behavior in any way.
So what is the problem here? Is there a bug in my regex, or is there a bug in Notepad++?
UPDATE
I have stripped down the actual file in question and have uploaded it to https://pastebin.com/w62E57U5. To reproduce the problem, please do the following:
Download the file from the link above and save it somewhere on your HDD (do not copy the text directly into Notepad++).
Load the file into Notepad++. The cursor now is in the topmost line, and nothing is selected.
This is essential: Click Edit -> EOL Conversion -> Unix (LF).
Verify that the cursor is still in the topmost line (which is empty) and that nothing is selected.
Open the Find dialog and choose the settings and enter the search string as described above.
Click "Find Next".
Note that now the complete text is found / selected.
Keeping the Find window open, delete the third line of the file (it reads "Funktionspaket(e): ML"). Do not just empty that line, but really delete it so that no empty line remains between the line before and the line after.
Again, place the cursor in the topmost line (which is still empty) and make sure nothing is selected.
Click "Find Next".
Note that the regular expression now works as expected.
Obviously, somebody is trying to make a fool of me, right?
I think the key is: you need to begin your regex with ^ (beginning of line).
Your original regex becomes:
^\n([^\n]+\n)+[^\n]+(Änderbar\:\ --[^\n]+\n)([^\n]+\n)+
But you can simplify it with:
^\R(?:.+\R)+.+Änderbar: --.+\R(?:.+(?:\R|\z))+
Note: tick . matches newline
Where:
\R matches any kind of linebreak, no needs to change the EOL.
\z matches the end of file, if you don't use it, you can't match the last line of the file if there're no linebreak.
(?:...) is a non capture group, much more efficient (if you don't need to capture, of course)
Both works fine with your 2 sample files.
It's not a bug. You're just forgetting something very important - with Windows line endings, your lines have a \r before the \n, so the \n([^\n]+\n)+ part of your RegEx will also match your blank lines which is why clicking "Find Next" matches everything from the cursor position instead of from the start of the block.
Go to Edit > EOL Conversion > Unix (LF) and you'll see that it works now. If you want to support Windows and Unix line endings you'll have to change every [^\n] to [^\r\n] and every \n to \r?\n.
This regex code works correctly for searching for lines that begins with an exclamation mark and does not contain colon : symbol
^!([^:\n]*)$
In addition to the regex code above, I need it to contain lines of text that has the word "spelling" in it, like this code below but does not work.
^!([^:\n]spelling*)$
You could do this:
^![^:\n]*spelling[^:\n]*$
If you are looping through a file line by line, as is typical, there is no need to exclude the newlines from the match:
^![^:]*spelling[^:]*$
Another option to consider when you have complex requirements is breaking the match down into mutiple steps. This makes for simpler, easier to understand code that is less error-prone:
if (/^!/ and /spelling/ and not /:/)
spelling*
matches
spellin
spelling
spellingg
spellinggg
etc. You were trying for
^([^:\n]*spelling[^\n]*)$
aka
^([^:\n]*spelling.*)$ # Assuming /s isn't used
But that would allow : after spelling, so you really want
^([^:\n]*spelling[^:\n]*)$
What about ^([^:\n]*spelling.*)$ ?
Adding .* allows any character (except newline) to be present after 'spelling'
The question and answers here cover in detail how the following vim command collapses a series of empty lines into a single line:
:g/^$/,/./-j
However, I want to do the same but also treat lines with onlywhite space in them as blank. The following command is what I tried but it doesn't work:
:g/^\s*$/,/./-j
As far as I can tell, that should find the lines that are empty and have only whitespace on them, but not all lines are being collapsed.
You're halfway there.
Remember that the initial command consisted of a search part and an action part. The search part :g/^$/ found all empty lines and the action part ,/./-j was executed for each (well, each that hadn't already been deleted by a previous j).
The modification you made to the search part of the string is correct in that it will now find lines that are either empty or contain only whitespace.
However, it's the action that you're executing after that that's causing you grief. The original action to be executed on the found line was ,/./-j which basically means execute a join j over the range from this line to the one before the next 'real' character. More detail on how this works can be found in the question you linked to.
The first 'real' character that it finds in your case actually includes whitespace so, while the search bit will find whitespace lines and act on them, the range of the join in the action will not be what you want.
What you need to specify for the end of the range in the action is the line previous to the next one that has something other than whitespace (rather than just a line with any 'real' character). A line with a non-whitespace character is simply one that matches the regex \S (the backslash with uppercase S denotes a non-whitespace character).
So, in the end, what you're looking for is:
:g/^\s*$/,/\S/-j
Having said that, keep in mind that the line that remains behind is (I think) the first from the range. So, it's not necessarily empty, it may contain white-space.
If you wish to ensure all whitespace-only lines are made empty, just execute:
:g/^\s*$/s/.*//
after the collapsing command above. Or, you can combine both into a single command using | as an action separator:
:g/^\s*$/,/\S/-j|s/.*//
I have a latex file in which I want to get rid of the last \\ before a \end{quoting}.
The section of the file I'm working on looks similar to this:
\myverse{some text \\
some more text \\}%
%
\myverse{again some text \\
this is my last line \\}%
\footnote{possibly some footnotes here}%
%
\end{quoting}
over several hundred lines, covering maybe 50 quoting environments.
I tried with :%s/\\\\}%\(\_.\{-}\)\\end{quoting}/}%\1\\end{quoting}/gc but unfortunately the non-greedy quantifier \{-} is still too greedy.
It catches starting from the second line of my example until the end of the quoting environment, I guess the greedy quantifier would catch up to the last \end{quoting} in the file. Is there any possibility of doing this with search and replace, or should I write a macro for this?
EDIT: my expected output would look something like this:
this is my last line }%
\footnote{possibly some footnotes here}%
%
\end{quoting}
(I should add that I've by now solved the task by writing a small macro, still I'm curious if it could also be done by search and replace.)
I think you're trying to match from the last occurrence of \\}% prior to end{quoting}, up to the end{quoting}, in which case you don't really want any character (\_.), you want "any character that isn't \\}%" (yes I know that's not a single character, but that's basically it).
So, simply (ha!) change your pattern to use \%(\%(\\\\}%\)\#!\_.\)\{-} instead of \_.\{-}; this means that the pattern cannot contain multiple \\}% sequences, thus achieving your aims (as far as I can determine them).
This uses a negative zero-width look-ahead pattern \#! to ensure that the next match for any character, is limited to not match the specific text we want to avoid (but other than that, anything else still matches). See :help /zero-width for more of these.
I.e. your final command would be:
:%s/\\\\}%\(\%(\%(\\\\}%\)\#!\_.\)\{-}\)\\end{quoting}/}%\1\\end{quoting}/g
(I note your "expected" output does not contain the first few lines for some reason, were they just omitted or was the command supposed to remove them?)
You’re on the right track using the non-greedy multi. The Vim help files
state that,
"{-}" is the same as "*" but uses the shortest match first algorithm.
However, the very next line warns of the issue that you have encountered.
BUT: A match that starts earlier is preferred over a shorter match: "a{-}b" matches "aaab" in "xaaab".
To the best of my knowledge, your best solution would be to use the macro.