Regex to replace multiple blank lines - regex

Why does the following pattern not match only two or more consecutive blank lines?
(Including regex flag : Multiline)
/(^\s*$){2,}/m
Using Regex101, I see that it matches (for example) the first single blank line of example below (note, I did use ALT-255 for the first character in the block quote below just to represent a starting blank line, remove it if you copy the example text):
 
some text after the first blank line
more text
// comment after a space
// comment after 2 blank lines
text
// comment
// comment
How can I tweak this to match 2 or more blank lines only?

Regex you should be using is ^\n{1,}$
This will look for 2 or more blank newlines.
Regex101 Demo

^(\n{2,})
Here is the working DEMO

Related

Regex with lazy match (for first orccurance only) in Notepad++

I have this regex to find all lines starting from the word "chapter" till the first blank line: ^chapter.*^\s*$. However I want it to show the first occurrence only, so I tried adding to the end '.?' or '(.+?)'. But I an not sure how to implement the lazy quantifier here.
Example text:
Chapter 1: some text
more than one line,
next line.
Chapter 2: text text text
other text
Chapter 3: more text
more lines
more lines
So the regex should match from the first word "Chapter " till the blank line before the next chapter.. etc.
You can use" /Chapter((?!\n\n).)*/s, on windows Chapter((?!\R\R)(.|\R))*\R?
Chapter literal matches chapter beginning
((?!\n\n).)* matches any character as long as next two characters are not newlines (due to negative lookahead (?!\n\n))
note the s option that makes the dot . match a newline; If you don't have such option in notepad++, you can use Chapter((?!\n\n)(.|\n))* and on windows Chapter((?!\R\R)(.|\R))*\R? to match the newline.
Basic Demo
Windows demo
You may use this regex to select first set of lines starting with Chapter:
\AChapter.*\r?\n(?:.*?\S.*\r?\n)*
RegEx Demo
RegEx Details:
\A: Start anchor (matches once per document)
Chapter.*\r?\n: Match text Chapter followed by any text till line break
(?:.*?\S.*\r?\n)*: Match 0 or more following lines containing at least one non-space character

Regex for matching text between two regex-patters

I am looking for a way to capture text and its paragraph title from a text document.
Text File:
paraTitle-1
--------
Lines and words
empty....
more lines
still part of paraTitle-1
paraTitle-2
--------
Lines and words
empty....
more lines
still part of paraTitle-2
I want to capture both the titles and the text below them.
array = [paraTitle-1: <text...below paraTitle-11>,
paraTitle-2: <text below paraTitle-2>]
I made a few attempts with pattern (?<=(.*))\n----*\n(?=(.*)) to no avail. Any guidance would be awesome.
The following regex will do:
(?!--------\R)(.*)\R--------\R((?:\R?(?!.*\R--------\R).*)+)
See regex101.
The title separator line (--------) can also be specified as -{8}, which is easier to adjust to variable length if needed, e.g. instead of exactly 8 dashes, it could be 6 or more: -{6,}
Explanation:
Capture a line of text (paragraph title):
(.*)\R
The . doesn't match line break characters
\R matches line breaks, including the Windows CRLF pair. If your regex engine doesn't support \R, use \r?\n as a simple alternative.
Make sure the captured text is not the title separator line:
(?!--------\R)
Skip the mandatory title separator line:
--------\R
Capture the paragraph text, as a repeating group of lines:
((?:xxx)+)
A line has an optional leading line break (first line doesn't have one):
\R?.*
But make sure the line is not the title of the next paragraph, i.e. it's not a line followed by the title separator line.
(?!.*\R--------\R)

REGEX - How can I select/mark 3 works delimited by tabs on a consecutive lines?

Happy New Year !
I have a problem. I don’t know how to marks\select some words delimited by tabs on a consecutive lines: Recent, Coments and Tags
please see this print screen:
I can easy to put | sign, like: Recent|Comments|Tags but this will select all the words in the files that repeats, and I want only those 3 on those lines.
What I want is to make a regex, to remove all text before those 3 words, and another regex to remove everything after those 3 words.
I try something like this ((?s)((^.*)^.*Recente.*$|^.*Coments.*$|^.*Tags.*^))(.*$)but is not very good. And I have to pay atention, because those words can repeated in the text files, so I have to select\mark exactly those 3, on that 3 consecutive line (that doesn't have any other words on it)
Since you mentioned in a comment that you want to do this in Notepad++ (a fact that should have been mentioned in the question text), and since the screenshot shows a single space after the first two words, you might try this regular expression:
.*\n([ \t]+Recente\s+Coments\s+Tags).*
It will select everything, but capture the 3 words including whitespace between them and whitespace preceding first word on same line.
If you then replace with $1, everything not in the capture group will be removed.
Actually, the spaces after the first two words don't matter to this regex.
Could you please try this in perl:
perl -0777 -ne 'while(m/((\s|\t)+)Recent\n\1Comments\n\1Tags/g){print "$&\n";}' /path/to/file
To breakdown:
Start with 1 or more tab characters (first capture group)
Then "Recent" followed by new line
Capture group, Comments and new line
Capture group, Tags
By the way, is "tab" really tab or multiple consecutive whitespaces (\s+)?

Notepad++ Regex Issue - Remove Number in Line Replace with HTML

I'm a regex newbie so this has been a lot of trial and error but for some reason I can only get this to work sometimes and I'm not sure why. Let me layout what I'm doing. I have a text file that looks like this:
1.Some Text Here
A paragraph of words here.
2.Some More Text Here
A paragraph of words here.
I use this code to find the lines with a number at the beginning:
^[0-9]+.([^.]*)$
Then I replace it with this:
<h2>$1</h2>\r\r
The problem I'm running into is that it usually grabs the line starting with the number but for some reason it will grab the line with the number and the paragraph below it. So instead of putting the </h2> at the end of the line it puts it at the end of the paragraph below.
I displayed all symbols to see if it had something to do with carriage/line returns but everything looks identical from line to line. The paragraph is on its own line and I see CRLF at the end of each line.
The expression [^.] (ie not a literal dot) matches newlines.
Don't match newlines in your capture:
^[0-9]+\.([^.\r\n]*)
Note that I also escaped the dot following the numbers, making it match a literal dot (a naked dot matches any character).
use \2 instead of $2, check "wrap around"tested on notepad++ 5.9.3 (UNICODE)
Not sure what version of notepad++ you're using but your version of the regex works fine for the example that you have ... i use 6.7.9.2
I can reproduce with the following text. Notice the paragraph for line 1 doesn't end in a period.
1.Some Text Here[CR][LF]
A paragraph of words here[CR][LF]
2.Some Text Here[CR][LF]
A paragraph of words here.[CR][LF]
Your regex matches any number of lines that begins with a set of digits, and doesn't end in a period. It could include more than one line. I would recommend this regex: ^[0-9]+\.([^\r\n]*)\r\n.

regex in Notepad++ to remove blank lines

I have multiple html files and some of them have some blank lines, I need a regex to remove all blank lines and leave only one blank line.. So it removes anything more than one blank line, and leave those that are just one or none (none like in having text in them).
I need it also to consider lines that are not totally blank, as some lines could have spaces or tabs (characters that doesn't show), so I need it to consider these lines with the regex to be removed as long as it is more than one line..
Search for
^([ \t]*)\r?\n\s+$
and replace with
\1
Explanation:
^ # Start of line
([ \t]*) # Match any number of spaces or tabs, capture them in group 1
\r?\n # Match one linebreak
\s+ # Match any following whitespace
$ # until the last possible end of line.
\1 will then contain the first line of whitespace characters, so when you use that as the replacement string, only the first line of whitespace will be preserved (excluding the linebreak at the end).
This worked for me on notepad++ v6.5.1. UNICODE windows 7
Search for: ^[ \t]*\r\n
Replace with: nothing, leave blank
Search mode: Regular expression.
search for (\r?\n(\t| )*){3,}, replace by \r\n\r\n, check "Regular expression" and ". matches newline".
Tested with Notepad++ 6.2
This will replace the successive blank lines containing white spaces (or not) and replace it with one new line.
Search for
(\s*\r?\n){3,}
replace with
\r\n
You can find it yourself what you need to replace with
\n\n OR \n\r\n or \r\n\r\n etc ... now you can even modify your regular expression ^([ \t]*)\r?\n\s+$ according to your need.
I tested any of the above suggestions, always was either too less or to much deleted. So that either you got no blank line where at least one was beforehand or deleted not enough (whitespaces was left, etc.). Unfortunately I cannot write comments yet. Tested both with 6.1.5 and updated to 6.2 and tested again. depending on how mayn files there are, I would suggest use
Edit->Blank Operations->Trim trailing whitespace
Followed by Ctrl+A and
TextFX -> TextFX Edit -> Delete surplus blank lines
A Macro I tried to record didn't work. Theres even a macro for just remove trailing whitespace (Alt+Shift+S, see Settings | Shortcut Mapper... | Macros). There's a
Edit->Blank Operations->Remove unnecessary EOL and whitespace
but that deletes every EOL and puts everything in a single line.
In notepad++ v8.4.7 there is the option:
Edit > Line Operations > Remove Empty Lines (Containing Blank characters)
or
Edit > Line Operations > Remove Empty Lines
So there is no need to use a regular expressions for this. But this only works for one file at a time.
I looked for ^\r\n and click "Replace All" with nothing (empty) in "Replace with" textbox.