Regex to Match All Whitespace After Word - regex

I have strings like this:
"2015/08/this filename has whitespace .jpg"
I need to match the whitespace characters in those strings. They will all have "2015/08/ and will end with ".
I'm using Sublime Text 2 to search and replace in a SQL DB dump. I'm at a loss on how to do the match. I know I can match whitespace with \s, but I have no clue how to contain to those groups.

As per my comment, this expression should work for a string that has the same number of opening/closing double quotes:
\s+(?=(?:(?:[^"]*"){2})*[^"]*"[^"]*$)
See demo here. The look-ahead is checking for an odd number of double quotes until the end of file.
Another approach is to define the boundary with \G and trim the beginning of the match with \K:
(?:"\d{4}\/\d{2}\/|(?!^)\G)[^"\s]*\K\s(?=[^"]*")
See demo
The regex finds a match:
(?:"\d{4}\/\d{2}\/|(?!^)\G) - when a substring starts with numbers like 2015/12/ or after a successful match
[^"\s]*\K - matches all characters that are not whitespace or " and omits them due to \K operator
\s - here it matches a whitespace symbol
(?=[^"]*") - a look-ahead checking we are presumably inside double quotes.
Replacing the spaces with, say, %20 results in:

Related

Regex for no single quote and newline character in between single quotes

so far I have this '.[^ \n']*'(?!') with a negative look ahead after the last qoute
Unfortunately, this does allow ''' (three single quotes).
The regex should match these strings
'abc'
'abc##$%^xyz'
The regex shouldn't match these strings
'\n'
'abc#'#$%^xyz'
'''
'
My current regex is looking at negative precedes for a single quote. I am trying to find a way to make it more generalized so if doesn't match if it has odd number of single qoutes.
If your patterns occur always alone in a line, you could use this:
^'[^\n']*'$
If you want to find matching pairs of single quotes in a bigger text, I think regex is not the solution for you.
You could use:
^'[^\n']*(?:'[^\n']*')*[^\n']*'$
Explanation
^ Start of string
' Match a single quote
[^\n']* Match 0+ chars other than a newline or a single quote
(?: Non capture group to repeat as a whole part
'[^\n']*' Match from ' to ' without matching newlines in between
)* Close the non capture group and optionally repeat it
[^\n']* Match 0+ chars other than a newline or a single quote
' Match a single quote
$ End of string
See a regex101 demo.

Regex lookahead with unknown number of spaces

I am trying to capture a string that can contain any character but must always be followed by ';'
I want to capture it and trim the white space around it. I've tried using positive lookahead but that does not seem to exclude the whitespace.
Example:
this is a match ;
this is not a match
regex:
.+(?=\s*;)
result:
"this is a match " gets captured with trailing white space behind.
expected result:
"this is a match" (without whitespace)
You have to make sure the first and the last characters of your match are not spaces. Thus we use the non-whitespace character match (\S) before and after the all character match (.*). As spaces might be optional, the any character match (.) must be optional, thus we use * instead of +.
\S.*\S(?=\s*;)
If the string can start with space use .*\S(?=\s*;).
Demonstration
Thanks to #CarySwoveland for improving the answer.
You can match
.*(?<!\s)(?=\s*;)
provided the regex engine supports negative lookbehinds.
Demo
Note that this returns an empty string if the string is " ;".
You can make the dot non greedy and start the match with a non whitespace character:
\S.*?(?=\s*;)
Regex demo
If the non whitespace character itself should also not be a semicolon:
[^\s;].*?(?=\s*;)

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

notepad++ remove text between two string using regular expression

I want to remove text between two strings using regular expression in notepad++. Here is my full string
[insertedOn]) VALUES (1, N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
I want final string like this
[insertedOn]) VALUES (N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
Here I removed 1, from string. 1,2,3 is in incremental order.
I tried lot of expression but not worked. Here is one of them (VALUES ()(?s)(.*)(, N')
How can I remove this?
You may use
(VALUES \().*?,\s*(N')
and replace with $1$2. Note that in case the part of string to be removed can contain line breaks, enable the . matches newline. If the N and VALUES must be matched only when in ALLCAPS, make sure the Match case option is checked.
Pattern details
(VALUES \() - Group 1 (later referred with $1 from the replacement pattern): a literal substring VALUES (
.*? - any 0+ chars, as few as possible, up to the leftmost occurrence of the sunsequent subpatterns
,\s* - a comma and 0+ whitespaces (use \h instead of \s to only match horizontal whitespace chars)
(N') - Group 2 (later referred with $2 from the replacement pattern): a literal substring N'.
You should first escape literal ( before VALUES: \(
By doing so, .* in your regex in addition to s (DOTALL) flag causes engine to greedily match up to end of input string then backtracks to stop at the first occurrence of , N' which means unexpected matches.
To improve your regex you should 1) make .* ungreedy 2) remove (?s) 3) escape (:
(VALUES \().*?, (N')
To be more precise in matching you'd better search for:
VALUES \(\K\d+, *(?=N')
and replace with nothing.
Breakdown:
VALUES \( March VALUES ( literally
\K Reset match
\d+, * Match digits preceding a comma and optional spaces
(?=N') Followed by N'

Match everything besides an empty line or lines containing only whitespaces

What is the easiest way to match all lines which follow these rules:
The line is not empty
The line does not only contain whitespace
I've found an expression which only matches empty lines or those, who only contains white spaces, but I am not able to invert it. This is what I have found: ^\s*[\r\n].
Is it simply possible to invert regular expressions?
Thank you very much!
To match non-empty lines, you can use the following regex with multiline mode ON (thanks #Casimir for the character class correction):
^[^\S\r\n]*\S.*$
The end of line is consumed with .* that matches any characters but a newline.
See demo
To just check if the line is not whitespace (but not match it), use a simplified version:
^[^\S\r\n]*\S
See another demo
The [^\S\r\n]* matches 0 or more characters other than non-whitespace and carriage return and line feed symbols. The \S matches a non-whitespace character.
And by the way, if you code in C#, you do not need a regex to check if a string is whitespace, as there is String.IsNullOrWhiteSpace, just split the multiline string with str.Split(new[] {"\r\n"}, StringSplitOptions.None).
Just verify that there is at least one non-whitespace character:
^.*\S.*$
See it in action
Explanation:
From start (^) til end ($)
.* - any amount of any characters
\S - one non-whitespace character