RegEx: Grabbing semicolon enclosed by quotation marks and at least one character - regex

I need a regular expression (that works in notepad++) that grabs semicolons enclosed by quotation marks and where at least one character is between the quotation mark and the semicolon.
This semicolon should be matched: "asdf;a3"
This semicolon should not be matched: ";"
Until now I have the following regex: \"(.*?)\"
However, this matches everything between the quotation marks. I only need the semicolon as a match.
Thanks for your help.

You could use a capturing group and a negated character class not match any of the listed characters:
"[^";]+(;)[^;"]+"
Regex demo
Or make use of \K to forget what was macthed and a positive lookahead:
"[^;"]+\K;(?=[^;"]+")
Regex demo
To match multipe semicolons between double quotes, you could make use of \G
Explanation
(?:"|\G(?!^))[^";]+\K;(?=[^"]+")
(?: Non capturing group
" Match "
| Or
\G(?!^) Assert position at the end of the previous match, not at the start
) Close non capturing group
[^";]+ Match 1+ times not " or '
\K; Forget what was matched and match ;
(?=[^"]+") Positive lookahead, assert that what is on the right is 1+ times not " and then match "
Regex demo
Note: if you don't want to match newlines you could add that to the character class [^;"\r\n]

Try this regex:
/.+?\;.+?/g
Here is a link that will help you understand the flow of this regex.
Here is link displaying the demo of this regex.

Related

Regex for no single quote and newline character in between single quotes

so far I have this '.[^ \n']*'(?!') with a negative look ahead after the last qoute
Unfortunately, this does allow ''' (three single quotes).
The regex should match these strings
'abc'
'abc##$%^xyz'
The regex shouldn't match these strings
'\n'
'abc#'#$%^xyz'
'''
'
My current regex is looking at negative precedes for a single quote. I am trying to find a way to make it more generalized so if doesn't match if it has odd number of single qoutes.
If your patterns occur always alone in a line, you could use this:
^'[^\n']*'$
If you want to find matching pairs of single quotes in a bigger text, I think regex is not the solution for you.
You could use:
^'[^\n']*(?:'[^\n']*')*[^\n']*'$
Explanation
^ Start of string
' Match a single quote
[^\n']* Match 0+ chars other than a newline or a single quote
(?: Non capture group to repeat as a whole part
'[^\n']*' Match from ' to ' without matching newlines in between
)* Close the non capture group and optionally repeat it
[^\n']* Match 0+ chars other than a newline or a single quote
' Match a single quote
$ End of string
See a regex101 demo.

RegEx matching only within a match / restrict matching to part of string

Is there a way to use a single regular-expression to match only within another math. For example, if I want to remove spaces from a string, but only within parentheses:
source : "foobar baz blah (some sample text in here) and some more"
desired: "foobar baz blah (somesampletextinhere) and some more"
In other words, is it possible to restrict matching to a specific part of the string?
In PCRE a combination of \G and \K can be used:
(?:\G(?!^)|\()[^)\s]*\K\s+
\G continues where the previous match ended
\K resets beginning of the reported match
[^)\s] matches any character not in the set
See demo at regex101
The idea is to chain matches to an opening parentheses. The chain-links are either [^)\s]* or \s+. To only get spaces \K is used to reset before. This solution does not require a closing ).
In other regex flavors that support \G but not \K, capturing groups can help out. Eg Search for
(\G(?!^)|\()([^)\s]*)\s+
and replace with captures of the 2 groups (depending on lang: $1$2 or \1\2) - Regex101 demo
Further there is (*SKIP)(*F), a PCRE feature for skipping over certain parts. It is often used together with The Trick. The idea is simple: skip this(*SKIP)(*F)|match that - Regex101 demo. Also this can be worked around with capture groups. Eg replace ([^)(]*\(|\)[^)(]*)|\s with$1
One idea is to replace any space between parentheses using a lookahead pattern:
(?=([^\s\(]+ )*\S*\))(?!\S*\s*\()`
The lookahead will attempt to match the last space before the closed parenthesis (\S*\)) and any optional space before ([^\s\(]+ )* (if found).
Detailed Regex Explanation:
: space
(?=([^\s\(]+ )*\S*\)): lookahead non-capturing group
([^\s\(]+ )*: any combination characters not including the open parenthesis and the space characters + space (this group is optional)
\S*\): any non-space character + closed parenthesis
(?!\S*\s*\(): what lookahead should not be
\S*: any non space character (optional), followed by
\s*: any space character (optional), followed by
\(: the open parenthesis
Check the demo here.

Regular Expression to prevent Email Name Spoofing

I want to match everything where .com or my\s?example appears in the display name of a From header and where the From email address is not .*#myexample.com.
It's easy when the display name is enclosed by quotation marks, but fails when the quotation marks are absent.
"(.*?(my\s?example|\.com).*?)"(?!\s?\<.*?\#myexample\.com\>)
Please see here:
https://regexr.com/5im6l
Everything works as desired except for the last line in the input field, where the double quotes are missing. I would like it to also match for this.
If an if clause is supported, and you want to capture what is between the double quotes if they are both there or capture the whole string if there are no double quotes at the start and end, you might use:
\bFrom:\s(")?(.*?\b(my\s?example|\.com)\b.*?)(?(1)")\s+<(?!\s?[^\r\n<>]*#myexample\.com>)
The pattern matches:
\bFrom:\s(")? A word boundary, match From: and optionally capture " in group 1
(.*?\b(my\s?example|\.com)\b.*?) Capture group 2, match a part that contains either myexample or .com where the alternatives are in group 3
(?(1)") If clause, if group 1 exists, match " so it is not part of the capture group
\s+< Match 1+ whitespace chars and <
(?! Negative lookahead, assert that what is at the right is not
\s?[^\r\n<>]*#myexample\.com> Match #myexample\.com between the brackets
) Close lookahead
Group 2 contains the whole match, and group 3 contains a part with either Myexample or .com using a case insensitive match.
Regex demo
If \K is supported to forget what is matched so far, and you want as another example a match only:
\bFrom:\s"?\K.*?\b(?:my\s?example|\.com)\b.*?(?="?\s<(?![^<>]*#myexample\.com>))
Regex demo
Note that you don't have to escape \< \> and \#

RegEx: don't capture match, but capture after match

There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?
With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1
It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo

Match everything to the first unescaped (with \) character

I have following input:
!foo\[bar[bB]uz\[xx/
I want to match everything from start to [, including escaped bracket \[ and ommiting first characters if in [!#\s] group
Expected output:
foo\[bar
I've tried with:
(?![!#\s])[^/\s]+\[
But it returns:
foo\[bar[bB]uz\[
Java: Use Lookbehind
(?<=!)(?:\\\[|[a-z])+
See the regex demo
Explanation
The lookbehind (?<=!) asserts that what precedes the current position is the character !
The non-capture group (?:\\\[|[a-z]) matches \[ OR | a letter between a and z
The + causes the group to be matched one or more times
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
You can use this regex:
!((?:[^[\\]*\\\[)*[^[]*)
Online Regex Demo
Add a ? after [^/\s]+ to catch the shortest group possible
Add \w+ to the end to catch the first group of alphanumeric characters after \[
Result :
(?![!#\s])[^\/\s]+?\[\w+
Try it
You can try this pattern:
(?<=^[!#\s]{0,1000})(?:[^!#\s\\\[]|\\.)(?>[^\[\\]+|\\.)*(?=\[)
pattern details:
The begining is a lookbehind and means preceded by zero or several forbidden characters at the start of the string
(?:[^!#\s\\\[]|\\.) ensures that the first character is an allowed character or an escaped character.
(?>[^\[\\]+|\\.)* describes the content: all that is not a [ or a \, or an escaped character. (note that this subpattern can be written like that too: (?:[^\[\\]|\\.)*)
(?=\[) checks that the next character is a literal opening square bracket. (since all escaped characters are matched by the precedent group, you can be sure that this one is not escaped)
link to fiddle (push the Java button)
Use a negated character class first the start (ie the match must not start with a special char), then a reluctant quantifier (which stops at the first hit), with a negative look behind to skip over escaped brackets:
[^!#\s].*?(?<!\\)\[
See live demo