Regex for no single quote and newline character in between single quotes - regex

so far I have this '.[^ \n']*'(?!') with a negative look ahead after the last qoute
Unfortunately, this does allow ''' (three single quotes).
The regex should match these strings
'abc'
'abc##$%^xyz'
The regex shouldn't match these strings
'\n'
'abc#'#$%^xyz'
'''
'
My current regex is looking at negative precedes for a single quote. I am trying to find a way to make it more generalized so if doesn't match if it has odd number of single qoutes.

If your patterns occur always alone in a line, you could use this:
^'[^\n']*'$
If you want to find matching pairs of single quotes in a bigger text, I think regex is not the solution for you.

You could use:
^'[^\n']*(?:'[^\n']*')*[^\n']*'$
Explanation
^ Start of string
' Match a single quote
[^\n']* Match 0+ chars other than a newline or a single quote
(?: Non capture group to repeat as a whole part
'[^\n']*' Match from ' to ' without matching newlines in between
)* Close the non capture group and optionally repeat it
[^\n']* Match 0+ chars other than a newline or a single quote
' Match a single quote
$ End of string
See a regex101 demo.

Related

Regex lookahead with unknown number of spaces

I am trying to capture a string that can contain any character but must always be followed by ';'
I want to capture it and trim the white space around it. I've tried using positive lookahead but that does not seem to exclude the whitespace.
Example:
this is a match ;
this is not a match
regex:
.+(?=\s*;)
result:
"this is a match " gets captured with trailing white space behind.
expected result:
"this is a match" (without whitespace)
You have to make sure the first and the last characters of your match are not spaces. Thus we use the non-whitespace character match (\S) before and after the all character match (.*). As spaces might be optional, the any character match (.) must be optional, thus we use * instead of +.
\S.*\S(?=\s*;)
If the string can start with space use .*\S(?=\s*;).
Demonstration
Thanks to #CarySwoveland for improving the answer.
You can match
.*(?<!\s)(?=\s*;)
provided the regex engine supports negative lookbehinds.
Demo
Note that this returns an empty string if the string is " ;".
You can make the dot non greedy and start the match with a non whitespace character:
\S.*?(?=\s*;)
Regex demo
If the non whitespace character itself should also not be a semicolon:
[^\s;].*?(?=\s*;)

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

RegEx: Negated lookahead a back reference many times

How do we lookahead until there is no back reference of a character in RegEx?
Given:
We are looking for phrases within quotes and it can be multiline "check we have a return here
but this line is still part of previous one 'a' string".
It breaks once we have another 'testing with single quotes "surrounding" double quotes';
How do we look for double quotes and single quotes once they close themselves?
I tried this pattern, but it's not working:
/(['"])[^$1]+\1/g
Look here
If your strings have no escape sequences, it is as easy as using a tempered greedy token like
/(['"])(?:(?!\1)[\s\S])+\1/g
See the regex demo. The (?:(?!\1)[\s\S])+ matches any symbol ([\s\S]) that is not the value captured into Group 1 (either ' or "). To also match "" or '', replace the + (1 or more occurrences) with * quantifier (0 or more occurrences).
If you may have escape sequences, you may use
/(['"])(?:\\[\s\S]|(?!\1)[^\\])*?\1/g
See this demo.
See the pattern details:
(['"]) - Group 1 capturing a ' or "
(?:\\[^]|(?!\1)[^\\])*? - 0+ (but as few as possible) occurrences of
\\[^] - any escape sequence
| - or
(?!\1)[^\\] - any char other than \ and the one captured into Group 1
\1 - the value kept in Group 1.
NOTE: [\s\S] in JS matches any char including line break chars. A JS only construct that matches all chars is [^] and is preferable from the performance point of view, but is not advised as it is not supported in other regex flavors (i.e. it is not portable).

Pattern for apostrophe inside quotes

I am looking for a pattern that can find apostrophes that are inside single quotes. For example the text
Foo 'can't' bar 'don't'
I want to find and replace the apostrophe in can't and don't, but I don't want to find the single quotes
I have tried something like
(.*)'(.*)'(.*)'
and apply the replace on the second matching group. But for text that has 2 words with apostrophes this pattern won't work.
Edit: to clarify the text could have single quotes with no apostrophes inside them, which should be preserved as is. For example
'foo' 'can't' bar 'don't'
I am still looking for only apostrophes, so the single quotes around foo should not match
I believe you need to require "word" characters to appear before and after a ' symbol, and it can be done with a word boundary:
\b'\b
See the regex demo
To only match the quote inside letters use
(?<=\p{L})'(?=\p{L})
(?<=[[:alpha:]])'(?=[[:alpha:]])
(?U)(?<=\p{Alpha})'(?=\p{Alpha}) # Java, double the backslashes in the string literal
Or ASCII only
(?<=[a-zA-Z])'(?=[a-zA-Z])
You can use the following regular expression:
'[^']+'\s|'[^']+(')[^' ]+'
it will return 3 matches, and if capture group 1 participated in the word, it will be the apostrophe in the word:
'foo'
'can't'
'don't'
demo
How it works:
'[^']+'\s
' match an apostrophe
[^']+ followed by at least one character that isn't an apostrophe
' followed by an apostrophe
\s followed by a space
| or
'[^']+(')[^' ]+'
' match an apostrophe
[^']+ followed by at least one character that isn't an apostrophe
(') followed by an apostrophe, and capture it in capture group 1
[^' ]+ followed by at least one character that is not an apostrophe or a space
' followed by an apostrophe

Regex to Match All Whitespace After Word

I have strings like this:
"2015/08/this filename has whitespace .jpg"
I need to match the whitespace characters in those strings. They will all have "2015/08/ and will end with ".
I'm using Sublime Text 2 to search and replace in a SQL DB dump. I'm at a loss on how to do the match. I know I can match whitespace with \s, but I have no clue how to contain to those groups.
As per my comment, this expression should work for a string that has the same number of opening/closing double quotes:
\s+(?=(?:(?:[^"]*"){2})*[^"]*"[^"]*$)
See demo here. The look-ahead is checking for an odd number of double quotes until the end of file.
Another approach is to define the boundary with \G and trim the beginning of the match with \K:
(?:"\d{4}\/\d{2}\/|(?!^)\G)[^"\s]*\K\s(?=[^"]*")
See demo
The regex finds a match:
(?:"\d{4}\/\d{2}\/|(?!^)\G) - when a substring starts with numbers like 2015/12/ or after a successful match
[^"\s]*\K - matches all characters that are not whitespace or " and omits them due to \K operator
\s - here it matches a whitespace symbol
(?=[^"]*") - a look-ahead checking we are presumably inside double quotes.
Replacing the spaces with, say, %20 results in: