Match everything besides an empty line or lines containing only whitespaces - regex

What is the easiest way to match all lines which follow these rules:
The line is not empty
The line does not only contain whitespace
I've found an expression which only matches empty lines or those, who only contains white spaces, but I am not able to invert it. This is what I have found: ^\s*[\r\n].
Is it simply possible to invert regular expressions?
Thank you very much!

To match non-empty lines, you can use the following regex with multiline mode ON (thanks #Casimir for the character class correction):
^[^\S\r\n]*\S.*$
The end of line is consumed with .* that matches any characters but a newline.
See demo
To just check if the line is not whitespace (but not match it), use a simplified version:
^[^\S\r\n]*\S
See another demo
The [^\S\r\n]* matches 0 or more characters other than non-whitespace and carriage return and line feed symbols. The \S matches a non-whitespace character.
And by the way, if you code in C#, you do not need a regex to check if a string is whitespace, as there is String.IsNullOrWhiteSpace, just split the multiline string with str.Split(new[] {"\r\n"}, StringSplitOptions.None).

Just verify that there is at least one non-whitespace character:
^.*\S.*$
See it in action
Explanation:
From start (^) til end ($)
.* - any amount of any characters
\S - one non-whitespace character

Related

Regex - match any characters and allow any number of single spaces. Break match on a double space

I am looking to create a match for the following:
"Adam Lambert"
"Mr. Adam Lambert"
"adam#test.com"
But not match the following
"Adam Lambert"
"Adam Lambert "
Rules:
Any alphanumeric character should be matches
A single space at any point should be matched.
Any number of single spaces can be matches
double spaces are not matched
a single space at the end of a string is not matched
EDIT
I also need to match the following. Sorry I missed this.
name:((\w+(?:\S\w+)*|\s(?:\w+\S)*)\S)*
I need to match to:
name:
name:A
name:Adam Lambert
The above regex matches from "name:Ad..." but it will not match "name:A"
I would generalize a solution to matching a sequence of non-space characters followed by optional groups of non-space characters following a single space only, since your only hard criterion seems to be the number of spaces. For example:
^\S+(?: \S+)*$
^(?:\S+(?:\s\S+)*|\s(?:\S+\s)*)\S$
Meaning:
^ start of the line
(?: non-capturing group
\S+ one or more non-whitespace characters
(?:\s\S+)* zero or more groups of a single whitespace and one or more
non-whitespace characters
or (|)
^ start of the line
\s one whitespace character
(?:\S+\s)* zero or more groups of non-whitespace characters and one whitespace character
) end non-capturing group
Finally one non whitespace character \S and the end of the line: $.
In your third example the # won't be matched with \w but it will if you change it to \S (any non-whitespace character)
See it in action here: regexr.com/50lp2
edit: I can't type

Regular Expression to Match Past Label Including Empty String

Using a regular expression, I'm trying to match a label, in this case "Business Unit:", followed by one or more spaces, then match everything in a submatch after that to the end of that line. I'm having a problem when there are no characters after the label on the line, it grabs the next line.
For example, here's some test data:
Business Unit:(space)(space)BU1(space)
This is Line 2
Business Unit:(space)(space)
This is Line 4
So I want to grab just "BU1" from the first line, and that works. It should match an empty string from the third line, but it matches the contents of the fourth line instead, in this case "This is Line 4".
Here is my expression:
Business Unit:\s+(.+)
I thought the dot character is not suppose to match a newline, but it seems like it is.
What's the correct regular expression in this case?
The real problem here is that \s+ is greedy, so it will match all whitespace (including new lines), so it matches up until the next line and then .+ catches the rest.
This should meet your requirements.
The pattern is ^Business Unit: *([\S]*)
This is assuming of course your business unit won't contain any spaces. If it does, then I can modify the pattern.
It depends, a bit on the context you are using the regex in because multi-line handling may vary, but here is a start:
/^Business Unit: +([^ ]*) *$/
^ Starting from the beginning of the line,
Match the literal, Business Unit:,
+ followed by 1 or more spaces,
([^ ]*) capture any possible non-blank stuff,
*$ followed by spaces till the end of the line.
Again, depending on your context, you may need to specify the linend as \n:
/^Business Unit: +([^ ]*) *\n/
The \n character is part of \s. That is why you get a match onto the following line.
You can do:
/^Business Unit:[ \t]*([^\n]*?)[ \t]*$/m
Demo
If you want to exclude the leading horizontal spaces and not match if blank:
/^Business Unit:[ \t]+(\S+)[ \t]*$/m
Demo
Use a character class substraction for whitespace except newlines:
Business Unit:[\s&&[^\n]]*(\S*)
See live demo.
The expression [\s&&[^\n]] is the subtraction, then the capture is for 0 or more non-whitespace (your target).
In your example you capture the last line because \s also matches a newline.
What you could do is replace \s+ to a whitespace and capture in a group any character zero or more times .*
You might use a word boundary \b at the start.
\bBusiness Unit: +(.*)
Update
Bases on the comments, to not match whitespace at the end of the line you could use match one or more times a non whitespace characters \S+ followed by repeated pattern that matches a whitespace or a tab [ \t] and one or more times a non whitespace character and make the group optional ?
\bBusiness Unit: +(\S+(?:[ \t]\S+)*)?

regexp print line by line and remove last word

I am trying to remove last word from each line if line contains more than one word.
If line has only one word then print it as it, no need to delete it.
say below are the lines
address 34 address
value 1 value
valuedescription
size 4 size
from above lines I want to remove all last words from each line except from 3rd line as it has only one word using regexp ..
I tried below regexp and it is removing single word lines also
$_ =~ s/\s*\S+\s*+$//;
Need your help for the same.
You can use:
$_ =~ s/(?<=\w)\h+\w+$//m;
RegEx Demo
Explanation:
(?<=\w): Lookbehind to assert that we have at least one word char before last word
\h+: Match 1+ horizontal whitespaces
\w+: match a word with 1+ word characters
$: End of line
Try this regex:
^(?=(?:\w+ \w+)).*\K\b\w+
Replace each match with a blank string
Click for Demo
OR
^((?=(?:\w+ \w+)).*\b)\w+
and replace each match with \1
Click for Demo
Explanation(1st Regex):
^ - asserts the start of the line
(?=(?:\w+ \w+)) - positive lookahead to check if the string has 2 words present in it
.* - If the above condition satisfies, then match 0+ occurrences of any character(except newline) until the end of the line
\K - forget everything matched so far
\b - backtrack to find the last word boundary
\w+ - matches the last word
a single word with no whitespace matches your regex since you've used \s* both before and after the \S+, and \s* matches an empty string.
You could use $_ =~ s/^(.*\S)\s+(\S+)$/$1/;
[Explanation: Match the RegEx if the line contains some number of characters ending with a non-whitespace (stored in $1), followed by 1 or more white-space characters, followed by 1 or more non-white-space characters. If there is a match, replace it all with the first part ($1).]
Though you might want to trim leading/trailing whitespace if you think it might contain any - depends on what you want to happen in those cases.

Remove all spaces from lines starting with specific word

Using Regex find/replace in Notepadd++ how can I remove all spaces from a line if the line starts with 'CHAPTER'?
Example Text:
CHAPTER A B C
Once upon a time.
What I want to end up with:
CHAPTERABC
Once upon a time.
Incorrect code is something like:
(?<=CHAPTER)( )(?<=\r\n)
So 'CHAPTER' needs to stay and the search should stop at the first line break.
You may use a \G based regex to only match a line that starts with CHAPTER and then match only consecutive non-whitespace and whitespace chunks up to the linebreak while omitting the matched non-whitespace chunks and removing only the horizontal whitespace:
(?:^CHAPTER|(?!^)\G)\S*\K\h+
Details:
(?:^CHAPTER|(?!^)\G) - CHAPTER at the start of a line (^CHAPTER) or (|) the end of the previous successful match ((?!^)\G, as \G can also match the start of a line, we use the retricting negative lookahead.)
\S* - zero or more non-whitespace symbols
\K - a match reset operator forcing the regex engine omit the text matched so far (thus, we do not remove CHAPTER or any of the non-whitespace chunks)
\h+ - horizontal whitespace (1 or more occurrences) only

Regex to Match All Whitespace After Word

I have strings like this:
"2015/08/this filename has whitespace .jpg"
I need to match the whitespace characters in those strings. They will all have "2015/08/ and will end with ".
I'm using Sublime Text 2 to search and replace in a SQL DB dump. I'm at a loss on how to do the match. I know I can match whitespace with \s, but I have no clue how to contain to those groups.
As per my comment, this expression should work for a string that has the same number of opening/closing double quotes:
\s+(?=(?:(?:[^"]*"){2})*[^"]*"[^"]*$)
See demo here. The look-ahead is checking for an odd number of double quotes until the end of file.
Another approach is to define the boundary with \G and trim the beginning of the match with \K:
(?:"\d{4}\/\d{2}\/|(?!^)\G)[^"\s]*\K\s(?=[^"]*")
See demo
The regex finds a match:
(?:"\d{4}\/\d{2}\/|(?!^)\G) - when a substring starts with numbers like 2015/12/ or after a successful match
[^"\s]*\K - matches all characters that are not whitespace or " and omits them due to \K operator
\s - here it matches a whitespace symbol
(?=[^"]*") - a look-ahead checking we are presumably inside double quotes.
Replacing the spaces with, say, %20 results in: