Remove all spaces from lines starting with specific word - regex

Using Regex find/replace in Notepadd++ how can I remove all spaces from a line if the line starts with 'CHAPTER'?
Example Text:
CHAPTER A B C
Once upon a time.
What I want to end up with:
CHAPTERABC
Once upon a time.
Incorrect code is something like:
(?<=CHAPTER)( )(?<=\r\n)
So 'CHAPTER' needs to stay and the search should stop at the first line break.

You may use a \G based regex to only match a line that starts with CHAPTER and then match only consecutive non-whitespace and whitespace chunks up to the linebreak while omitting the matched non-whitespace chunks and removing only the horizontal whitespace:
(?:^CHAPTER|(?!^)\G)\S*\K\h+
Details:
(?:^CHAPTER|(?!^)\G) - CHAPTER at the start of a line (^CHAPTER) or (|) the end of the previous successful match ((?!^)\G, as \G can also match the start of a line, we use the retricting negative lookahead.)
\S* - zero or more non-whitespace symbols
\K - a match reset operator forcing the regex engine omit the text matched so far (thus, we do not remove CHAPTER or any of the non-whitespace chunks)
\h+ - horizontal whitespace (1 or more occurrences) only

Related

notepad regex - lines without character occuring n times

I'm looking for correct regex to find lines with less then n times the TAB (\t) character.
I tried this one but it finds nothing:
^.*(?:\t.*){0,20}\r\n
Your pattern contains a .* at the start (after ^, start of string/line anchor), and it matches any zero or more chars other than line break chars, as many as possible. So, it can match any amount of tabs. Then, (?:\t.*){0,20} matches zero, one ... twenty occurrences of a tab and then again any zero or more chars other than line break chars as many as possible.
In the end, the regex does not restrict the amount of tabs on a line at all.
To match lines having no more than N amount of tabs you need
^(?!(?:[^\t\r\n]*\t){N+1}).*
where N is your occurrence count. So, if you want to match (and later remove, since you have \r\n at the end of the regex) lines having no more than 20 tabs, you can use
^(?!(?:[^\t\r\n]*\t){21}).*\R?
See the regex demo.
Details:
^ - start of string/line
(?!(?:[^\t\r\n]*\t){21}) - a negative lookahead that fails the match if there are twenty-one occurrences of zero or more chars other than CR, LF and TAB followed with a TAB char immediately to the right of the current location
.* - the rest of the line
\R? - an optional line break sequence (CRLF, LF or CR).

Regex remove last newline

Given the following ; delimited string
a;; z
toy;d;hh
toy
;b;;jj
z;
d;23
d;23td
;;io;
b y;b;12
z
a;b;bb;;;34
z
and this regex
^(?!(?:(a|d))(?:;|$)).*(\s*\z|$)\R*
I am looking to get the full lines whose 1st. column is not a or d, and have the matching lines removed, to get this , after substituting with empty
a;; z
d;23
d;23td
a;b;bb;;;34
Please see the demo
In the Substitution panel, there is a 5th empty line, which needs to be removed.
I have used this \s*\z in this past for this purpose. As implemented here, it does not seem to work.
Any help is appreciated
I think the reason your regex won't remove the last newline, is that it is part of the end of the last part that you want to keep, so without matching it you can't remove it.
So I rewrote the regex to match the line you want to keep, but also to include everything above and below the match that is not another match.
The key difference is using a conditional to only match the newline of the group you want to keep if it is followed by another match.
regex (linebreaks for readability):
((?!(a|d)).*(\s*\z|$)\R*)*
(^(a|d).*(?(?=\R*(.*\s*\R+)*(a|b))\R))
((?!(a|d)).*(\s*\z|$)\R*)*
replace with $4 -->
a;; z
d;23
d;23td
a;b;bb;;;34
For readability I removed some of the non-capturing and string separator logic you had, if they are necessary you can add them back in.
Logic breakdown of the parts:
(?(?=\R*(.*\s*\R+)*(a|b))\R) is the conditional, it only matches the newline \R if (?) it is followed by (?=) any non-matching lines (.*\s*\R+)* that end in a newline followed by (a|b).
The middle part (^(a|d).*(?(?=\R*(.*\s*\R+)*(a|b))\R)) containing this ends up as the replacing group $4. It thus matches lines starting with (a|d), and all but the last match also match the newline at the end of their line.
The beginning and end of the regex ((?!(a|d)).*(\s*\z|$)\R*)* is exactly the same, and matches of all the unneeded stuff so that it gets removed.
You could match what you want to remove, and capture in a group what you want to keep.
To prevent removing the newline sequences between capturing groups, you could use an if clause (? to only match 0+ unicode newline sequences when there is no more line following that starts with [ad];
In the replacement use group 1 $1
^(?:(?![ad];).*\R*)*|^([ad];.*(?:\R[ad];.*)*)(?(?![\s\S]*\R[ad];)\R*)
Explanation
^ Start of line
(?: Non capture group
(?![ad];) If the line does not start with a or d followed by ;
.*\R* Match the whole line and 0+ times a unicode newline sequence
)* Close group and repeat 0+ times to match all consecutive lines
| Or
^ Start of line
( Capture group 1
[ad];.* Match a or d followed by ; and the rest of the line
(?: Non capture group
\R[ad];.* Match newline, a or d followed by ; and the rest of the line
)* Close group and repeat 0+ times to match all consecutive lines
) Close group 1
(? If clause, only match a unicode newline sequence if the [ad]; pattern does not occur anymore
(?! Negative lookahead, assert what follows is not
[\s\S]*\R[ad]; Match the [ad]; pattern
) Close lookahead.
\R* If the assertion is true, Match 0+ unicode newline sequences
) Close if clause
See a Regex demo

How to capture everything until another capture group

I have the following template :
1251 Left Random Text I want to fill
It can go through multiple lines
As you can see
9841 Right Again we see a lot of random text with 3115 numbers
And this also goes
To multiple lines
0121 Right
5151 Right This one is just one line
I was wrong
9731 Left This one is just a line
5123 NA Instruction 5151 was wrong
4113 Right Instr 9841 was correct
We checked
I want to have 3 groups:
1251
Left
Random Text I want to fill
It can go through multiple lines
As you can see
I'm using
(\d+)\s(\w+)\s(.*)
but it stops at the current line only (so I get only Random Text I want to fill in group 3, although I want including As you can see)
If I'm using Single line flag I get only 1 match for each group, group 3 almost being all
Here is live : https://regex101.com/r/W3x0mH/4
You could use a repeating group matching all the lines while asserting that the next line does not start wit 1+ digits followed by Left or Right:
(\d+)\s(\w+)\s(.*(?:\r?\n(?!\d).*)*)
Explanation
(\d+)\s(\w+)\s Match the first 2 groups
(Third capturing group
.* Match 0+ times any char except a newline
(?: Non capturing group
\r?\n(?!\d).* Match newline, assert what is on the right is not a digit
)* Close non capturing group and repeat 0+ times
) Close capturing group
Regex demo
You may use this regex with a lookahead:
^(\d+)\s(\w+)\s(.*?)(?=\n\d|\z)
with DOTALL and MULTILINE modifiers.
Updated Regex Demo
RegEx Details:
^: Line start
(\d+): Match and capture 1+ digits in group #1
\s: match a whitespace
(\w+): Match and capture 1+ word characters in group #2
\s: match a whitespace
(.*?): Match 0 or more of any character (non-greedy) provided next lookahead assertion is satiSfied
(?=\n\d|\z): Lookahead assertion to assert that we have a newline followed by a digit or there is end of input
Faster Regex:
If you are using this regex on a long string then you should also keep overall performance in mind as a regex with DOTALL modifier will tend to get slow on a large size text. For that I suggest using this regex that doesn't need DOTALL modifier:
^(\d+)\s(\w+)\s(.*(?:\n.*)*?)(?=\n\d|\z)
RegEx Demo 2
On regex101 demo this regex takes just 181 steps as compared to first one that takes 1300 steps.
For the third group, repeat any character while using negative lookahead for ^\d, which would indicate the start of a new match:
(\d+)\s(\w+)\s((?:(?!^\d)[\s\S])*)
https://regex101.com/r/W3x0mH/5
You may try with this regex:
^(\d+)\s+(\w+)\s+(.*?)(?=^\d|\z)
^(\d+)\s+ , ^\d+ Line begins with numbers followed by one or more whitespace character \s+
(\w+)\s+ where \w+ one or more characters (left,right,na or something else) followed by one or more whitespace \w+
(.*?) matches everything until it finds a line beginning with number or \z end of string.
I think it fits your requirement....
Regex101

Find all lines using regular expression

There is a text like this (many lines)
1. sdfsdf werwe werwemax45 rwrwerwr
2. 34348878 max max44444445666 sdf
3. 4353424 23423eedf max55 dfdg dfgdf
4. max45
5. 4324234234sdfsdf maxx34534
Using regular expressions I need to find all lines and include a word max<digits> (containing digits instead of literally <digits>) into a matching group.
So I've tried this regular expression:
^.*?\b(max\d+)\b.*?$
But it finds only lines containing max... and ignores others.
Then I’ve tried
^.*?\b(max\d+)?\b.*?$
It finds all lines but without matching group containing max....
The issue can be "debugged" with a slightly modified pattern, ^(.*?)\b(max\d+)?\b(.*?)$, with the rest of the pattern wrapped into separate capturing groups. You can see that the lines are all matched by the Group 3 pattern, the last .*?. It happens because the first .*? is skipped (since it is a lazy pattern), then (max\d=)? matches an empty string at the start of the line (none begins with max + digits - but if any line starts with that pattern, you would get it captured), and the last .*? captures the whole line.
You can fix it by wrapping the first part into a non-capturing optional group capturing the max\d+ into an obligatory capturing group
^(?:.*?\b(max\d+)\b)?.*?$
Or even without ?$ at the end since .* will match greedily up to the end of the line:
^(?:.*?\b(max\d+)\b)?.*
See the regex demo
Details
^ - start of string (with m option, start of a line)
(?:.*?\b(max\d+)\b)? - an optional non-capturing group:
.*? - any 0+ chars, other than line break chars as few as possible
\b - a word boundary
(max\d+) - Group 1 (obligatory, will be tried once): max and 1+ digits
\b - a word boundary
.* - rest of the line

Match everything besides an empty line or lines containing only whitespaces

What is the easiest way to match all lines which follow these rules:
The line is not empty
The line does not only contain whitespace
I've found an expression which only matches empty lines or those, who only contains white spaces, but I am not able to invert it. This is what I have found: ^\s*[\r\n].
Is it simply possible to invert regular expressions?
Thank you very much!
To match non-empty lines, you can use the following regex with multiline mode ON (thanks #Casimir for the character class correction):
^[^\S\r\n]*\S.*$
The end of line is consumed with .* that matches any characters but a newline.
See demo
To just check if the line is not whitespace (but not match it), use a simplified version:
^[^\S\r\n]*\S
See another demo
The [^\S\r\n]* matches 0 or more characters other than non-whitespace and carriage return and line feed symbols. The \S matches a non-whitespace character.
And by the way, if you code in C#, you do not need a regex to check if a string is whitespace, as there is String.IsNullOrWhiteSpace, just split the multiline string with str.Split(new[] {"\r\n"}, StringSplitOptions.None).
Just verify that there is at least one non-whitespace character:
^.*\S.*$
See it in action
Explanation:
From start (^) til end ($)
.* - any amount of any characters
\S - one non-whitespace character