Regex fix for GSuite content compliance

Regex fix for GSuite content compliance - regex

I have a regex that I am trying to check for phishing emails.
The emails come in like:
Principal-joe smith <officeemailxyz#gmail.com>
I need to identify any email that has
principal*#gmail.com or #hotmail.com or #yahoo.com.
This is my regex:
(\W|^)(?i)pr[i!1]nc[i!1]p[a#]l#(yahoo|hotmail|gmail)\.com(\W|$)
(\W|^)(?i)pr[i!1]nc[i!1]p[a#]l---WHAT DOES HERE---#(yahoo|hotmail|gmail)\.com(\W|$)
Or is there a better way to do this?

First, I think you have to make the search case insensitive with the option /i.
Than, you should include any email-address-valid character plus the space, if I understand your example correctly.
I ran a couples of tests and the following seems to catch all cases.
/^[\w\s]*principal[\<+\s*a-zA-Z0-9._-]*?#[yahoo|hotmail|gmail]*\.com[\>]?$/gmi

Related

HTML5 Pattern Attribute: Exclude Keywords

I'm trying to write an HTML5 pattern to prevent users from entering free email accounts. So far I have this...
<input name="email"
placeholder="Work Email"
required
type="email"
title="Enter a valid work email address (No free email services)"
pattern="^((?!hotmail)(?!gmail)(?!ymail)(?!googlemail)(?!live)(?!gmx)(?!yahoo)(?!outlook)(?!msn)(?!icloud)(?!facebook)(?!aol)(?!zoho)(?!mail)(?!yandex)(?!hushmail)(?!lycox)(?!lycosmail)(?!inbox)(?!myway)(?!aim)(?!fastmail)(?!goowy)(?!juno)(?!shortmail)(?!atmail)(?!protonmail).)*$"/>
It is close, but is missing two key rules...
Should only look at what comes between the '#' and the '.'
Should be case insensitive.
Any ideas on how to get this working?
UPDATE
To avoid unhelpful comments about if I should be doing this, consider other use cases where an input field should not contain a list of known keywords. A similar use case could be swear words or an ID prefix where multiple prefixes exist, but you want to avoid just one type being entered... ID should never contain IXT and user enters... WIN-09880, IXT-2342, NTS-23422.

This is a really bad idea, but in the spirit of answering your question, here is my answer.
You can use:
pattern="^.+#((?!hotmail)(?!gmail)(?!ymail)(?!googlemail)(?!live)(?!gmx)(?!yahoo)(?!outlook)(?!msn)(?!icloud)(?!facebook)(?!aol)(?!zoho)(?!mail)(?!yandex)(?!hushmail)(?!lycox)(?!lycosmail)(?!inbox)(?!myway)(?!aim)(?!fastmail)(?!goowy)(?!juno)(?!shortmail)(?!atmail)(?!protonmail).)+\..+$"
to only look between the '#' and the '.'. HTML5 doesn't support the i flag for case-insensitivity, so you will either need to use JavaScript or hardcode case-insensitivity into the pattern.

You can't:
I've found a non-exhaustive list of free email providers which contains 2840 entries.
You'll block users that works at Microsoft, Google, Facebook, Yahoo, ProtonMail, Free, Orange, Sfr and a lot others.
What will you do with users that have theirs domains name?

Here's a corrected (and shortened) version of your regex that targets only the domain portion of the address:
^(?!.*#(?:hotmail|gmail|ymail|googlemail|live|gmx|yahoo|outlook|msn|icloud|facebook|aol|zoho|mail|yandex|hushmail|lycox|lycosmail|inbox|myway|aim|fastmail|goowy|juno|shortmail|atmail|protonmail)\.\w+$).*$
You can shorten it further if you need to:
^(?!.*#(?:live|gmx|yahoo|outlook|msn|icloud|facebook|aol|zoho|yandex|lycox|inbox|myway|aim|goowy|juno|(?:hot|[gy]|google|short|at|proton|hush|lycos|fast)?mail)\.\w+$).*$
You can't make it case insensitive because the JavaScript regex flavor, very annoyingly, doesn't support inline modifiers. But do you have to use a regex for this? I would prefer a code solution using an updatable list of banned domains.

Google Form Validate a specific URL Regex

I am creating a google form and trying to create a regex on of the fields because I need them to enter a profile link from a specific website. I'm a beginner with regex and this is what I have come up with:
/^(http:\/\/)?(steamcommunity\.com\/id\/)*\/?$/
But when I go to enter a test link such as: http://steamcommunity.com/id/bagzli it fails it. I don't understand what is wrong about it.

You missed a dot (meaning any character) after the (/id\). Try this:
/^(http:\/\/)?(steamcommunity\.com\/id\/).*\/?$/
^-- added

The ultimate goal of what I was trying to accomplish is to ensure that certain text was entered in the box. I thought I had to use Regex to accomplish that, but google forms also has "Text Contains" feature which I made use of to solve my problem. The regex by Zoff Dino did not work, I am not sure why as it seems completely correct.
I will mark this as resolved as I managed to get my answer, even if it was not via regex.

Filter by regex example

Could anyone provide an example of a regex filter for the Google Chrome Developer toolbar?
I especially need exclusion. I've tried many regexes, but somehow they don't seem to work:

It turned out that Google Chrome actually didn't support this until early 2015, see Google Code issue. With newer versions it works great, for example excluding everything that contains banners:
/^(?!.*?banners)/

It's possible -- at least in Chrome 58 Dev. You just need to wrap your regex with forward-slashes: /my-regex-string/
For example, this is one I'm currently using: /^(.(?!fallback font))+$/
It successfully filters out any messages that contain the substring "fallback font".
EDIT
Something else to note is that if you want to use the ^ (caret) symbol to search from the start of the log message, you have to first match the "fileName.js?someUrlParam:lineNumber " part of the string.
That is to say, the regex is matching against not just the log message, but also the stack-entry for the line which made the log.
So this is the regex I use to match all log messages where the actual message starts with "Dog":
/^.+?:[0-9]+ Dog/

The negative or exclusion case is much easier to write and think about when using the DevTool's native syntax. To provide the exclusion logic you need, simply use this:
-/app/ -/some\sother\sregex/
The "-" prior to the regex makes the result negative.

Your expression should not contain the forward slashes and /s, these are not needed for crafting a filter.
I believe your regex should finally read:
!(appl)
Depending on what exactly you want to filter.
The regex above will filter out all lines without the string "appl" in them.
edit: apparently exclusion is not supported?

Exim filters lookahead assertions to deal with outbound spam

I am attempting to create some rules to help deal with the outbound spam we've seen lately from our customers being compromised. To do this I'm using an Exim filter and checking the subject or content against some common themes.
I believe the best way to handle this would be to use lookahead assertions. If I put the lookahead assertion in quotes it fails to work.
So for example:
$header_subject: matches "^(?=.*WORD1)(?=.*WORD2)(?=.*WORD3)"
I've found examples of lookahead use in the Exim config however I have not found it in use as part of a filter which requires the quotes.
Maybe it's just not possible to use lookahead as part of a filter, or maybe there is even a better way to accomplish what I'm doing.

There is no real need for look a head assertions here, they are only required if you don't want to include the words in the match. Your basic regex is sort of correct but it will only match if the words are in order

How to handle one specific symptom of compromised accounts being abused by botnets:
https://github.com/Exim/exim/wiki/DetectSMTPAuthAbuse

Regex PatternRepository pattern on BlackBerry 5 - how to ignore case

I hope this title makes sense - I need case-insensitive regex matching on BlackBerry 5.
I have a regular expression defined as:
public static final String SMS_REG_EXP = "(?i)[(htp:/w\\.)]*cobiinteractive\\.com/[\\w|\\%]+";
It is intended to match "cobiinteractive.com/" followed by some text. The preceding (htp:w.) is just there because on my device I needed to override the internal link-recognition that the phone applies (shameless hack).
The app loads at start-up. The idea is that I want to pick up links to my site from sms & email, and process them with my app.
I add it to the PatternRepository using:
PatternRepository.addPattern(
ApplicationDescriptor.currentApplicationDescriptor(),
GlobalConstants.SMS_REG_EXP,
PatternRepository.PATTERN_TYPE_REGULAR_EXPRESSION,
applicationMenu);
On the os 4.5 / 4.7 simulators and on
a Curve 8900 device (running 4.5),
this works.
On the os 5 simulators and the Bold
9700 I tested, app fails to compile
the pattern with an
IllegalArgumentException("unrecognized
character after (?").
I have also tried (naively) to set the pattern to "/rockstar/i" but that only matches the exact string - this is possibly the correct direction to take, but if so, I don't know how to implement it on the BB.
How would I modify my regex in order to pick up case insensitive patterns using the PatternRepository as above?
PS: would the "correct" way be to use the [Cc][Oo][Bb][Ii]2... etc pattern? This is ok for a short string, but I am hoping for a more general solution if possible?

Well not a real solution for the general problem but this workaround is easy, safe and performant:
As your dealing here with URLs and they are not case-sensitive...
(it doesn't matter if we write google.com or GooGLE.COm or whatever)
The most simple solution (we all love KISS_principle) is to do first a lowercase (or uppercase if you like) on the input and than do a regex match where it doesn't matter whether it's case-sensitive or not because we know for sure what we are dealing with.

Since nobody else has answered this question relating to the PatternRepository class, I will self-answer so I can close it.
One way to do this would be to use a pattern like: [Cc][Oo][Bb][Ii]2[Nn][Tt][Ee][Rr][Aa][Cc][Tt][Ii][Vv][Ee]... etc where for each letter in the string, you put 2 options. Fortunately my string is short.
This is not an elegant solution, but it works. Unfortunately I don't know of a way to modify the string passed to PatternRepository and I think the crash when using the (?i) modifier is a bug in BB.

Use the port of the jakarta regex library:
https://code.google.com/p/regexp-me/
If you use unicode support, it's going to eat memory,
but if you just want case insensitive matching,
you simply need to pass the RE.MATCH_CASEINDEPENDENT flag when you compile your regex.
new RE("yourCaseInsensitivePattern", RE.MATCH_CASEINDEPENDENT | OTHER_FLAGS)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex fix for GSuite content compliance - regex

Related

HTML5 Pattern Attribute: Exclude Keywords

Google Form Validate a specific URL Regex

Filter by regex example

Exim filters lookahead assertions to deal with outbound spam

Regex PatternRepository pattern on BlackBerry 5 - how to ignore case

Categories

Resources