Regular Expression to prevent Email Name Spoofing

Regular Expression to prevent Email Name Spoofing - regex

I want to match everything where .com or my\s?example appears in the display name of a From header and where the From email address is not .*#myexample.com.
It's easy when the display name is enclosed by quotation marks, but fails when the quotation marks are absent.
"(.*?(my\s?example|\.com).*?)"(?!\s?\<.*?\#myexample\.com\>)
Please see here:
https://regexr.com/5im6l
Everything works as desired except for the last line in the input field, where the double quotes are missing. I would like it to also match for this.

If an if clause is supported, and you want to capture what is between the double quotes if they are both there or capture the whole string if there are no double quotes at the start and end, you might use:
\bFrom:\s(")?(.*?\b(my\s?example|\.com)\b.*?)(?(1)")\s+<(?!\s?[^\r\n<>]*#myexample\.com>)
The pattern matches:
\bFrom:\s(")? A word boundary, match From: and optionally capture " in group 1
(.*?\b(my\s?example|\.com)\b.*?) Capture group 2, match a part that contains either myexample or .com where the alternatives are in group 3
(?(1)") If clause, if group 1 exists, match " so it is not part of the capture group
\s+< Match 1+ whitespace chars and <
(?! Negative lookahead, assert that what is at the right is not
\s?[^\r\n<>]*#myexample\.com> Match #myexample\.com between the brackets
) Close lookahead
Group 2 contains the whole match, and group 3 contains a part with either Myexample or .com using a case insensitive match.
Regex demo
If \K is supported to forget what is matched so far, and you want as another example a match only:
\bFrom:\s"?\K.*?\b(?:my\s?example|\.com)\b.*?(?="?\s<(?![^<>]*#myexample\.com>))
Regex demo
Note that you don't have to escape \< \> and \#

Related

Match group followed by group with different ending

For example, let's say I have a list of words:
words.txt
accountable
accountant
accountants
accounted
I want to match "accountant\naccountants"
I've tried /(\n\w+){2}s/, but \w+ seems to be perfectly matching different things.
My RegEx also matches the following undesirable texts:
action
actionables
actionable
actions
Am I reaching out too far in what regex can do?

You could for example use a capture group, and match a newline followed by a backreference to the same captured text and an s char.
If the first word can also be at the start of the string, instead of being preceded by a newline, you can use an anchor ^ instead.
^(\w+)\n\1s$
^ Start of string
(\w+) Capture group 1, match 1+ word chars
\n\1s Match a newline, backreference \1 to match the same text as group 1 and an s char
$ End of string
Regex demo

Match global, but only if line starts with a specific string

I feel like this should be very simple, and I'm missing a single, important bit.
Example: https://regex101.com/r/lXh5Vj/1
Regex, using /m/g flags:
^GROUPS.*?"(?<name>[^:]+):(?<id>\d+)"
Test string:
GROUPS: ["group1:44343", "group2:23324", "group3:66567"]
USERS: ["user1:44343", "user2:23324", "user3:66567"]
My current regex will only match group1, because only that group is directly preceded by "GROUPS". I interpret this as "Global matching" meaning it will only start to check the string again after the first match. As there is no "GROUPS" between group1 and group2, group2 is not a match. If I alter the test string and add "GROUPS" before group2, this will also match, supporting my suspicion. But I do not know how to alter global matching handling to always consider the start of the line GROUPS.
The Regex should match 3 and 3 in the first line, and none in the second. If I remove the "GROUPS" part from the regex, the groups are matched just fine, but then also match the second line, which I do not want.

If you want to match GROUPS: [" at the start of the string, and the key:value parts in named groups, you can make use of the \G anchor.
(?:^GROUPS:\h*\["(?=[^][]*])|\G(?!^),\h*")(?<name>[^:]+):(?<id>\d+)"
(?: Non capture group
^GROUPS:\h*\[ Start of string, Match GROUPS: optional spaces and [
"(?=[^][]*]) Match " and assert a closing ] at the right
| Or
\G(?!^),\h*" Assert the position at the end of the previous match (to get consecutive groups) and match a comma, optional spaces and "
) Close non capture group
(?<name>[^:]+) Named group name Match 1+ times any char except :
: Match literally
(?<id>\d+) Named group id, match 1+ digits
" Match literally
Regex demo

Regex modify capturing group

I have this Regex
^(?!.*\b(?:https?:\/\/|www\.))\w+(?:\.\w+)*\.\w{2,}(?:,\w+(?:\.\w+)*\.\w{2,})+$
that captures multiple URL separated by commas
caputres google.com,facebook.com but not with extra characters like google.com/home.php?,facebook.com/pages/#ref=?

Assuming your URLs won't contain a comma, you can add another optional non-capturing group in your regex like this:
^(?!.*\b(?:https?:\/\/|www\.))\w+(?:\.\w+)*\.\w{2,}(?:\/[^,]*)?(?:,\w+(?:\.\w+)*\.\w{2,}(?:\/[^,]*)?)*$
RegEx Demo
Note addition of an optional non-capturing group in regex:
(?:\/[^,]*)?: That matches text starting with / followed by 0 or more of any character except a comma. ? makes this group optional

RegEx: Grabbing semicolon enclosed by quotation marks and at least one character

I need a regular expression (that works in notepad++) that grabs semicolons enclosed by quotation marks and where at least one character is between the quotation mark and the semicolon.
This semicolon should be matched: "asdf;a3"
This semicolon should not be matched: ";"
Until now I have the following regex: \"(.*?)\"
However, this matches everything between the quotation marks. I only need the semicolon as a match.
Thanks for your help.

You could use a capturing group and a negated character class not match any of the listed characters:
"[^";]+(;)[^;"]+"
Regex demo
Or make use of \K to forget what was macthed and a positive lookahead:
"[^;"]+\K;(?=[^;"]+")
Regex demo
To match multipe semicolons between double quotes, you could make use of \G
Explanation
(?:"|\G(?!^))[^";]+\K;(?=[^"]+")
(?: Non capturing group
" Match "
| Or
\G(?!^) Assert position at the end of the previous match, not at the start
) Close non capturing group
[^";]+ Match 1+ times not " or '
\K; Forget what was matched and match ;
(?=[^"]+") Positive lookahead, assert that what is on the right is 1+ times not " and then match "
Regex demo
Note: if you don't want to match newlines you could add that to the character class [^;"\r\n]

Try this regex:
/.+?\;.+?/g
Here is a link that will help you understand the flow of this regex.
Here is link displaying the demo of this regex.

Regex ignore part of the string in matches

Suppose I have a tags object as such:
["warn-error-fatal-failure-exception-ok","parsefailure","anothertag","syslog-warn-error-fatal-failure-exception-ok"]
I would like to be able to use regex to match on "failure" but exclude "warn-error-fatal-failure-exception-ok".
So in the above case if I used my regex to search for failure it should only match failure on parsefailure and ignore the rest.
How can this be accomplished using regex?
NOTE: The regex has to exclude the whole string "warn-error-fatal-failure-exception-ok"

EDIT
After documenting the answer below, I realized that maybe what you are looking for is:
(?<!warn-error-fatal-)failure(?!-exception-ok)
So I'm adding it here in case that it fits what you are looking for better. This regex is just looking for "failure" but using a Negative Lookbehind and a Negative Lookahead to specify that "failure" may not be preceded by "warn-error-fatal-" or followed by "-exception-ok".
ANSWER DEVELOPED FROM COMMENTS:
The following regex captures the "failure" substring in the "parsefailure" tag, and it puts it in Group 1:
^.*"(?![^"]*warn-error-fatal-failure-exception-ok[^"]*)[^"]*(failure)[^"]*".*$
DETAIL
I will break the regex in parts, and I'll explain each. First, let's forget about everything in between the first set of parentheses, and let's just look at the rest.
^.*"[^"]*(failure)[^"]*".*$
The important part of the regex is what we are trying to capture in the group, which is the word "failure" which itself is a part of a tag surrounded by double-quotes. The regular expression above matches the whole test string, but it focuses on a tag surrounded by double-quotes and containing the substring "failure".
^.*" matches any character from the beginning of the string to a quote
"[^"]*(failure)[^"]*" matches a tag surrounded by double-quotes and containing the substring "failure". Literally: a double-quote, followed by zero or more characters that are not double-quotes, followed by "failure", followed by zero or more characters that are not double-quotes, followed by a double-quote. The parentheses capture the word "failure" in group 1.
".*$ matches any character from the double-quote to the end of the test string
Because [^"]*(failure)[^"]* matches all tags containing the substring "failure", ^.*"[^"]*(failure)[^"]*".*$ will capture the substring "failure" from the first tag containing the string. In other words, it will capture "failure" from the warn-error-fatal-failure-exception-ok tag which is not what we want, so we most exclude the warn-error-fatal-failure-exception-ok tag from being a possible match to the tag portion of the regex: [^"]*(failure)[^"]*. This is achieved with a Negative Lookahead:
(?![^"]*warn-error-fatal-failure-exception-ok[^"]*)
This Negative Lookahead basically means: "The regular expression following the Negative Lookahead can't match [^"]*warn-error-fatal-failure-exception-ok[^"]*". The (?! and ) are just part of the syntax. You can read more about it here.
MORE BREAKDOWN
^ matches the beginning of the test string
.* matches any character zero or more times
" matches a double-quote character
[^"]* matches any character other than the double-quote character zero or more times
(failure) matches the word "failure", and since it is in parentheses, it will capture it in a group; in this case, it will be captured in group 1 because there is only one set of capturing parentheses. The parentheses of the Negative Lookahead are non-capturing.
$ matches the end of the test string

RegularExpression : [A-Za-z-]*(?<!("warn-error-fatal-))failure
Recognizes parsefailure and "syslog-warn-error-fatal-failure-exception-ok" not the other failure.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression to prevent Email Name Spoofing - regex

Related

Match group followed by group with different ending

Match global, but only if line starts with a specific string

Regex modify capturing group

RegEx: Grabbing semicolon enclosed by quotation marks and at least one character

Regex ignore part of the string in matches

Categories

Resources