Vim Regex - How to invert a regex which matches multiple lines? - regex

My goal is the regex to match everything but
the multi-line matches from the original regex.
This is the regex which I'm trying to invert:
\vselect\_.{-};
It matches correctly on the bellow sample code.
The sql statements are included and not the capitalized text.
THIS TEXT IS WHAT I WANT TO
MATCH ONCE THE ABOVE
REGEX IS CORRECTLY INVERTED.
select *
from foo
where bar;
THIS TEXT IS WHAT I WANT TO
MATCH ONCE THE ABOVE
REGEX IS CORRECTLY INVERTED.
select *
from bar
where foo;
THIS TEXT IS WHAT I WANT TO
MATCH ONCE THE ABOVE
REGEX IS CORRECTLY INVERTED.
Following the vim documentation on how to do
invert (negate) a regex, this is what I come up with:
\v^(select\_.{-};)#!.*
This regex matches on everything but on the first lines
from the sql statements. I want all lines from the
sql statements to be excluded and not just the first lines.
I also tried not to use the \v very magic directive and
escaped everything, but no luck, getting the same result.
What am I doing wrong?
Also if anyone can think of another way to accomplish what I'm trying to do would be great, the answer does not have to follow the way I'm trying it.

After thinking it over, I think I got what you meant...
This line should work for you:
\v;\n\n\zs\_.{-}\ze\n\_^select
Update according to the comment:
And this vimregex works if you have leading/ending TEXT blocks:
\v(;\n\n\zs\_.{-}|%^(select)#!\_.{-})\ze(\n\_^select|\n?%$)

Related

Regex: Replace double double quotes (solved), but only in lines that contain a special string (subcondition unsolved)

1. Summary of the problem
I have a csv file where I want to replace normal quotes in text with typographic ones.
It was hard (because HTML is also included), but I have meanwhile created a good regex expression that does just the right thing: in three "capturing groups" I find the left and right quotation marks and the text inside. Replacing then is a piece of cake.
2. Regex engine
I can use the regex engine of Notepad++ (boost) or PCRE2 comaptible, for developping and testing purposes I have used https://regex101.com.
3. What I'm having a hard time with and just can't get right, where I need your help is here:
I want to add a sub condition, in order to find the text in quotes only in certain lines, want to identify these lines by the language, e.g. ENGLISH or FRENCH (see also example in the screenshot).
Screenshot of a sample
The string indicating the language is always in the same line before the text to be found, BUT only the text in quotes (main condition) should be marked after matching the sub condition, so that I will be able to replace them.
It is about a few thousand records in the csv file, in the worst case I could also replace it manually. But I'm pretty sure that this should also work via regex.
4. What I have tried
Different approaches with look arounds and non-capturing groups didn't lead me to the desired result - possibly because I didn't really understand how they work.
An example can be found here: https://regex101.com/r/ketwwm/1
The example can be found here, it only contains the regex expression to match and mark the (three) groups WITHOUT the searched subcondition:
("")([^<>]*?)("")(?=(?:[^>]*?(?:<|$)))
Hopefully anyone in the community could help? (Hopefully I have not missed anything, it's my first post here )
5. Update 03/18/2022: Almost resolved with two slightly different approaches (thank you all!) What is still unsolved ..
Solution of #Thefourthbird (see answer 1)
^(?!.?"ENGLISH")[^"]".*(SKIP)(F)|("")([^<>]?)("")(?=(?:[^>]?(?:<|$)))
Nearly perfect, just missing matches in an HTML section. HTML sections in the csv file are always enclosed by double quotes and may have line feeds (LF). https://regex101.com/r/x5shnx/1
Solution of #Wiktor Stribiżew (see in comments below)
^.?"ENGLISH".?\K("")([^<>]?)("")(?=(?:[^>]?(?:<|$)))
The same with matches in HTML sections, see above. Plus: Doesn't match text in double double quotes if more than one such entry occurs within a text. https://regex101.com/r/I4NTdb/1
Screenshot (only to illustrate)
If you want to match multiple occasions, you can use SKIP matching all lines that do not start with FRENCH:
^"(?!FRENCH")[^"]*".*(*SKIP)(*F)|("")([^<>]*?)("")(?=(?:[^>]*?(?:<|$)))
The pattern matches:
^ Start of string
" Match literally
(?!FRENCH") Negative lookhead, assert not FRENCH" directly to the right
[^"]*" Match any char except " and match "
.*(*SKIP)(*F) Match the rest of the line and skip it
| Or
("")([^<>]*?)("")(?=(?:[^>]*?(?:<|$))) Your current pattern
Regex demo

How to replace all lines based on previous lines in Notepad++?

I have an XML code:
<Line1>Matched_text Other_text</Line1>
<Line2>Text_to_replace</Line2>
How to tell Notepad++ to find Matched_text and replace Text_to_replace to Replaced_text? There are several similar blocks of code, with one exactly Matched _text and different Other_text and Text_to_replace. I want to replace all in once.
My idea is to put
Matched_text*<Line2>*</Line2>
in the Find field, and
Matched_text*<Line2>Replaced_text</Line2>
in the Replace field. I know that \1 in regex might be useful, but I don't know where to start.
The actual code is:
<Name>Matched_text, Other_text</Name>
<IsBillable>false</IsBillable>
<Color>-Text_to_replace</Color>
The regex you're looking for is something like the following.
Find: (Matched_text[\w,\s<>\/]*<Color>-).*(</Color>)
Replace: \1Replaced_text\2
Broken down:
`()` is how you tell regex that you want to keep things (for use in /1, /2, etc.), these are called capture groups in regex land.
`Matched_text[\w,\s<>\/]*` means you want your anchor `Matched_text` and everything after it up till the next part of the expression.
`<Color>-).*(</Color>)` Select everything between <Color>- and </Color> for replacement.
If you have any questions about the expression, I highly recommend looking at a regex cheatsheet.

Deleting every 2nd line from a file using Notepad++

I am looking for some regex help.
I have a textfile, nothing super important but I would like to delete every second line from it - I have tried following this guide: Delete every other line in notepad++
However I just can't get it to work, is the regex I am using ok? I am noob with regex
Find:
([^\n]*\n)[^\n]*\n
Replace with:
$1
No matter what I try (mouse position at the beginning, ctrl+a and Replace All) I just can't get it to work. I appreciate any help.
I've put the regex into here: http://regexpal.com/ and if I remove the final \n it highlights the individual rows.
Make sure you select regular expression for the search mode...
Also, you may want to make that final newline optional. In the case that there are an even number of lines and you do not have a trailing newline, it won't remove the last line.
([^\n]*\n)[^\n]*\n?
Update:
See how Windows handle new lines with \r\n instead of just \n. Try updating the expression to take this into account:
([^\r\n]*[\r\n]+)[^\r\n]*[\r\n]*
Final Update:
Thanks to #zx81, I now know that N++ uses PCRE so \R can be used for unicode newline characters. However [^\R] won't work (this looks for anything except R literally), so you will need to keep [^\r\n]. This can be simplified as:
([^\r\n]*\R)[^\r\n]*\R?

Regular expression to find a text as being a whole word

I am using the ABAP statement READ REPORT and I want to use FIND ALL OCCURRENCES OF REGEX. Let's say for example I want to search for SELECT but when I do FIND ALL OCCURRENCES OF REGEX 'SELECT', the return table gets lines that have SELECT-OPTIONS, SELECTION-SCREEN and SELECT.
How do I use regex to get only those lines with SELECT, discarding the other 2 possible matches in the example above?
Just go for `SELECT `
Note the extra space and the use of grave quotes (grave quotes so that the trailing space is considered). This simple solution is feasible because it's highly unlikely that there's a new line right after SELECT.
Your requirement is so simple that you don't need to use a regular expression.
http://sapignite.com/regex-in-abap/
OR
Download a PDF from this Link
http://www.google.co.in/url?sa=t&rct=j&q=how%20do%20i%20use%20regex%20in%20abap%20to%20search%20for%20a%20specific%20string%3F&source=web&cd=1&ved=0CCMQFjAA&url=http%3A%2F%2Fwww.sdn.sap.com%2Firj%2Fscn%2Findex%3Frid%3D%2Flibrary%2Fuuid%2F902ce392-dfce-2d10-4ba9-b4f777843182%26overridelayout%3Dtrue&ei=AsFxT9bJNdDqrQfdoe3hDQ&usg=AFQjCNHTHvQXYtYosCLPwj98Za-LMJbo7w&cad=rja
use
\bselect\b
\b stands for word boundary. It will not match aselect or selected
look at a good regex reference at mozila.org and try your regex at regexpal
There is a very cool playground for testing regular expressions: Run Report DEMO_REGEX_TOY with SE38 or SE80.

Regex: remove lines not starting with a digit

I have been fighting this problem with the help of a RegEx cheat sheet, trying to figure out how to do this, but I give up... I have this lengthy file open in Notepad++ and would like to remove all lines that do not start with a digit (0..9). I would use the Find/Replace functionality of N++. I am only mentioning this as I am not sure what Regex implementation is N++ using... Thank you
Example. From the following text:
1hello
foo
2world
bar
3!
I would like to extract
1hello
2world
3!
not:
1hello
2world
3!
by doing a find/replace on a regular expression.
You can clear up those line with ^[^0-9].* but it will leave blank lines.
Notepad++ use scintilla, and also using its regex engine to match those.
\r and \n are never matched because in
Scintilla, regular expression searches
are made line per line (stripped of
end-of-line chars).
http://www.scintilla.org/SciTERegEx.html
To clear up those blank lines, only way is choose extended mode, and replace \n\n to \n, If you are in windows mode change \r\n\r\n to \r\n
[^0-9] is a regular expression that matches pretty much anything, except digits. If you say ^[^0-9] you "anchor" it to the start of the line, in most regular expression systems. If you want to include the rest of the line, use ^[^0-9].+.
^[^\d].* marks a whole line whose first character is not a digit. Check if there are really no whitespaces in front of the digits. Otherwise you'd have to use a different expression.
UPDATE:
You will have to do ot in two steps. First empty the lines that do not start with a digit. Then remove the empty lines in extended mode.
One could also use the technique of bookmarking in Notepad++. I started benefiting from this feature (long time present but only more recently made somewhat more visible in the UI) not very long ago.
Simply bring up the find dialogue, type regex for lines not starting with digit ^\D.*$ and select Mark All. This will place blue circles, like marbles, in the left gutter - these are line bookmarks. Then just select from main menu Search -> Bookmark -> Remove bookmarked lines.
Bookmarks are cool, you could extract these lines by simply selecting to copy bookmarked lines, opening new document and pasting lines there. I sometimes use this technique when reviewing log files.
I'm not sure what you are asking. but the reg exp for finding the lines with a digit at the beginning would be
^\d.*
you can remove all the lines that match the above or alternatly keep all the lines that match this expression:
^[^\d].*