PCRE Regex to RE2 Regex - regex

In a previous question, I got the answer on a regular expression to accept all email addresses from a certain domain except two from the same domain.
e.g. :-
BAD:
test#testdomain.com
tes2#testdomain.com
GOOD:
notest#testdomain.com
test23#testdomain.com
Here is the regular expression from that answer:
^(?!test#|tes2#)[A-Za-z0-9._%+-]+#testdomain\.com$
However, for my application, I specifically need RE2 regex to be able to use this.
What is the steps I should take to convert this PCRE expression to RE2 type?

Related

Regex to search multiple strings spam

I have a problem we are receiving many spam emails with the following strings:
(vi#gra, v1agra, v1#gra, v!#gr#)
I already create several regular expressions for each of the words but I don't know how to join my regular expressions to just one
^v[0-9]+agra$
You may use the following regex pattern:
v[i1!][a#]gr[a#]
Demo
Note that this pattern also matches viagra, in addition to the four viagra variants which you gave in your question.

Why does this regex not match in Python?

I have the regular expression
(GET|POST) (/api/\w+) (HTTP/1\.\d)(?:.*\\r\\n\\r\\n)(\S+)?
which I'm trying to match against HTTP GET and HTTP POST requests. I'm using the helpful regex101.com website to format my regular expression, and according to it, the regular expression should match both the formats I'm seeking.
Here's my regular expression on regex101.com.
However, when I input into Python itself and call re.split(), (on an input of strings), it doesn't split the POST request. It only splits the GET request. I thought it had something to do with the way regex101 parses \r\n (CRLF) versus how Python does it, so I double-checked and made sure that in Python, I actually type in \r\n\ inside the regex, and not \\r\\n, as I did in regex101. Yet it still doesn't work.
How can I get the regular expression to work inside Python?
Your'e just missing an additional \r\n after HTTP/1.0. This will work:
'POST /api/gettime HTTP/1.0\r\n\r\nContent-Length: 13\r\n\r\n100000+200000'

online tool available to validate regex in firestore?

There are tools available to validate the regex used in javascript / prolong etc but i am writing rules in google-cloud-firestore. I want some tool to check my regex.
please suggest.
If you read my original answer. Ignore it.
You can use the matches comparison.
matches
Performs a regular expression match, returns true if the whole
string matches the given regular expression. Uses Google RE2 syntax.
The full list of string validation rules available for Cloud Firestore are shown here.

Regular expression not working in google analytics

Im trying to build a regular expression to capture URLs which contain a certain parameter 7136D38A-AA70-434E-A705-0F5C6D072A3B
Ive set up a simple regex to capture a URL with anything before and anything after this parameter (just just all URLs which contain this parameter). Ive tested this on an online checker: http://scriptular.com/ and seems to work fine. However google analytics is saying this is invalid when i try to use it. Any idea what is causing this?
Url will be in the format
/home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd
so i just want to capture URLs that contain that specific "z" parameter.
regex
^.+(?=7136D38A-AA70-434E-A705-0F5C6D072A3B).+$
You just need
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B.+$
Or (a bit safer):
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&.+$)
And I think you can even use
=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&)
See demo
Your regex is invalid because GA regex flavor does not support look-arounds (and you have a (?=...) positive look-ahead in yours).
Here is a good GA regex cheatsheet.
To match /home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd you can use:
\S*7136D38A-AA70-434E-A705-0F5C6D072A3B\S*

CAtlRegExp for a regular expression that matches 4 characters max

Short version:
How can I get a regex that matches a#a.aaaa but not a#a.aaaaa using CAtlRegExp ?
Long version:
I'm using CAtlRegExp http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx to try to match email addresses. I want to use the regex
^[A-Z0-9._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$
extracted from here.
But the syntax that CAtlRegExp accepts is different than the one used there. This regex returns the error REPARSE_ERROR_BRACKET_EXPECTED, you can check for yourself using this app: http://www.codeproject.com/KB/string/mfcregex.aspx
Using said app, I created this regex:
^[a-zA-Z0-9\._%\+\-]+#([a-zA-Z0-9-]+\.)+[a-zA-Z]$
But the problem is this matches a#a.aaaaa as valid, I need it to match 4 characters maximum for the op-level domain.
So, how can I get a regex that matches a#a.aaaa but not a#a.aaaaa ?
Try: ^[a-zA-Z0-9\._%\+\-]+#([a-zA-Z0-9-]+\.)+\c\c\c?\c?$
This expression replaces the [A-Z]{2,4} sequence which CAtlRegExp doesn't support with \c\c\c?\c?
\c serves as an abbreviation of [a-zA-Z]. The question marks after the 3rd and 4th \c's indicate they can match either zero or one characters. As a result, this portion of the expression matches 2, 3 or 4 characters, but neither more nor less.
You are trying to match email addresses, a very widely used critical element of internet communication.
To which I would say that this job is best done with the most widely used most correct regex.
Since email address format rules are described by RFC822, it seems useful to do internet searches for something like "RFC822 email regex".
For Perl the answer seems to be easy: use Mail::RFC822::Address: regexp-based address validation
RFC 822 Email Address Parser in PHP
Thus, to achieve the most correct handling of email addresses, one should either locate the most precise regex that there is out somewhere for the particular toolkit (ATL in your case) or - in case there's no suitable existing regex yet - adapt a very precise regex of another toolkit (Perl above seems to be a very complete albeit difficult candidate).
If you're trying to match a specific sub part of email addresses (as seems to be the case given your question), then it probably still makes sense to start with the most up-to-date/correct/universal regex and specifically limit it to the parts that you require.
Perhaps I stated the obvious, but I hope it helped.