Regex Expression to replace email address domain, for users email address - regex

I am trying to solve an email domain co-existence problem with Exchange online. Basically i need it so when a message is sent to one tenant (domain.com) and forwarded to another tenant (newdomain.com) - that the To and/or CC headers are replaced with the endpoint (newdomain.com) email addresses before they are delivered to the final destination.
For Example:
1) Gmail (or any) user sends and email to sally.sue#domain.com, MX is looked up for that domain, it is delivered to the Office 365 Tenant for domain.com
2) That same office 365 tenant, is set to forward emails to sally.sue#newdomain.com (different tenant)
3) When the message arrives to sally sue at newdomain.com and she hits "Reply All" the original sender AND her (sally.sue#domain.com) are added to the To: line in the email.
The way to fix that is to use Header Replacement with Proofpoint, which as mentioned below works on a single users basis. The entire question below is me trying to get it to work using RegEx (As thats the only solution) for a large number of users.
I need to convert the following users email address:
username#domain.com to username#newdomain.com
This has to be done using ProofPoint which is a cloud hosted MTA. They have been able to provide some sort of an answer but its not working.
Proofpoint support has suggested using this:
Header Name : To
Find Value : domain\.com$
Replace : newdomain\.com$ or just newdomain.com
Neither of the above work. In both cases the values are just completely ignored.
This seems to find the values:
Header Name : To
Find Value : \b[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
Replace : $1#fake.com
But the above simply and only replaces the To: line (in the email) with the literal string: $1#fake.com
I would also need to be able to find lowercase and numbers in email addresses as well. i believe the above example only finds caps.
I need it do the following:
Header Name : To
Find Value : \b[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b (find users email address, domain)
Replace : user.name#newdomain.com
This is for a large number of users so there is no way to manually update or create separate rules for each user.
If i do create a individual rule, then it works as expected but as stated that requires manually typing out each user To: address And their new desired To: address.
This solution here almost worked: Regex to replace email address domains?

I have a couple of observations from general experience, although I have not worked with Office365 specifically.
First, a regex used for replacement usually needs to have a "capture group". This is often expressed with parentheses, as in:
match : \b([A-Z0-9._%-]+)#domain.com$
replacement : $1#newdomain.com
The idea is that the $1 in the replacement pattern is replaced with whatever was found within the () in the matching pattern.
Note that some regex engines use a different symbol for the replacement, so it might be \1#newdomain.com or some such. Note also that some regex engines need the parentheses escaped, so the matching pattern might be something like \b\([A-Z0-9._%-]+\)#domain.com$
Second, if you want to include - inside a "character class" set (that is, inside square brackets []), then the - should be first; otherwise it's ambiguous because - is also used for a range of characters. The regex engine in question might not care, but I suggest writing your matching pattern as:
\b([-A-Z0-9._%]+)#domain.com$
This way, the first - is unambiguously itself, because there is nothing before it to indicate the start of a range.
Third, for lowercase letters, it's easiest to just expand your character class set to include them, like so:
[-A-Za-z0-9._%]

Related

Regex: account#domain (active directory)

Something like testuser#test.local should be a match. The regex I use is:
\w[\w\.\- ]*$/#/^[a-zA-Z][a-zA-Z0-9\-\.]{0,61}[a-zA-Z]
What did I miss?
It's not like an E-Mail because it can be many sublevels in the domain e.g. test.local.country.city.street
You should also define, where in the text that address can appear. But if the address is on a separate line, then:
^[\w]+\#[\w]+\.[^\d\W]+$
The test is here.
If you want to accept complex names and domains of level above second, too, then:
^[\w]+(\.[\w]+)*\#[\w]+(\.[\w]+)*\.[^\d\W]+$

Regex to differentiate APIs

I need to create a regex to help determine the number the number of times an API is called. We have multiple APIs and this API is of the following format:
/foo/bar/{barId}/id/{id}
The above endpoint also supports query parameters so the following requests would be valid:
/foo/bar/{barId}/id/{id}?start=0&limit=10
The following requests are also valid:
/foo/bar/{barId}/id/{id}/
/foo/bar/{barId}/id/{id}
We also have the following endpoints:
/foo/bar/{barId}/id/type/
/foo/bar/{barId}/id/name/
/foo/bar/{barId}/id/{id}/price
My current regex to extract calls made only to /foo/bar/{barId}/id/{id} looks something like this:
\/foo\/bar\/(.+)\/id\/(?!type|name)(.+)
But the above regex also includes calls made to /foo/bar/{barId}/id/{id}/price endpoint.
I can check if the string after {id}/ isn't price and exclude calls made to price but it isn't a long term solution since if we add another endpoint we may need to update the regex.
Is there a way to filter calls made only to:
/foo/bar/{barId}/id/{id}
/foo/bar/{barId}/id/{id}/
/foo/bar/{barId}/id/{id}?start=0&limit=10
Such that /foo/bar/{barId}/id/{id}/price isn't also pulled in?
\/foo\/bar\/(.+)\/id\/(?!type|name)(.+)
There is something in your RegEx which is the cause to your problem. "(.+)" RegEx code matches every character after it. So replace it with "[^/]" and add the following code "/?(?!.+)". This is working for me.
/foo/bar/([^/]+)/id/(?!type|name)([^/]+)/?(?!.+)

URL general format

I have written a C++ program that allows URLs to be posted onto YouTube. It works by taking in the URL as input either from you typing it into the program or from direct input, and then it will replace every '/', '.' in the string with '*'. This modified string is then put on your clipboard (this is solely for Windows-users).
Of course, before I can even call the program usable, it has to go back: I will need to know when '.', '/' are used in URLs. I have looked at this article: http://en.wikipedia.org/wiki/Uniform_Resource_Locator , and know that '.' is used when dealing with the "master website" (in the case of this URL, "en.wikipedia.org"), and then '/' is used afterwards, but I have been to other websites, http://msdn.microsoft.com/en-us/library/windows/desktop/ms649048%28v=vs.85%29.aspx , where this simply isn't the case (it even replaced '(', ')' with "%28", "%29", respectively!)
I also seemed to have requested a .aspx file, whatever that is. Also, there is a '.' inside the parentheses in that URL. I have even tried to view the regular expressions (I don't quite fully understand those yet...) regarding URLs. Could someone tell me (or link me to) the rules regarding the use of '.', '/' in URLs?
Can you explain why you are doing this convoluted thing? What are you trying to achieve? It may be that you don't need to know as much as you think, once you answer that question.
In the mean time here is some information. A URL is really comprised of a number of sections
http: - the "scheme" or protocol used to access the resource. "HTTP", "HTTPS",
"FTP", etc are all examples of a scheme. There are many others
// - separates the protocol from the host (server) address
myserver.org - the host. The host name is looked up against a DNS (Dynamic Name Server)
service and resolved to an IP address - the "phone number" of the machine
which can serve up the resource (like "98.139.183.24" for www.yahoo.com)
www.myserver.org - the host with a prefix. Sometimes the same domain (`myserver.org`)
connects multiple servers (or ports) and you can be sent straight to the
right server with the prefix (mail., www., ftp., ... up to the
administrators of the domain). Conventionally, a server that serves content
intended for viewing with a browser has a `www.` prefix, but there's no rule
that says this must be the case.
:8080/ - sometimes, you see a colon followed by up to five digits after the domain.
this indicates the PORT on the server where you are accessing data
some servers allow certain specific services on just a particular port
they might have a "public access" website on port 80, and another one on 8080
the https:// protocol defaults to port 443, there are ports for telnet, ftp,
etc. Add these things only if you REALLY know what you are doing.
/the/pa.th/ this is the path relative to DOCUMENTROOT on the server where the
resource is located. `.` characters are legal here, just as they are in
directory structures.
file.html
file.php
file.asp
etc - usually the resource being fetched is a file. The file may have
any of a great number of extensions; some of these indicate to the server that
instead of sending the file straight to the requester,
it has to execute a program or other instructions in this file,
and send the result of that
Examples of extensions that indicate "active" pages include
(this is not nearly exhaustive - just "for instance"):
.php = contains a php program
.py = contains a python program
.js = contains a javascript program
(usually called from inside an .htm or .html)
.asp = "active server page" associated with a
Microsoft Internet Information Server
?something=value&somethingElse=%23othervalue%23
parameters that are passed to the server can be shown in the URL.
This can be used to pass parameters, entries in a form, etc.
Any character might be passed here - including '.', '&', '/', ...
But you can't just write those characters in your string...
Now comes the fun part.
URLs cannot contain certain characters (quite a few, actually). In order to get around this, there exists a mechanism called "escaping" a character. Typically this means replacing a character with the hexadecimal equivalent, prefixed with a % sign. Thus, you frequently see a space character represented as %20, for example. You can find a handly list here
There are many functions available for converting "illegal" characters in a URL automatically to a "legal" value.
To learn about exactly what is and isn't allowed, you really need to go back to the original specifications. See for example
http://www.ietf.org/rfc/rfc1738.txt
http://www.ietf.org/rfc/rfc2396.txt
http://www.ietf.org/rfc/rfc3986.txt
I list them in chronological order - the last one being the most recent.
But I repeat my question -- what are you really trying to do here, and why?

How to create Gmail filter searching for text only at start of subject line?

We receive regular automated build messages from Jenkins build servers at work.
It'd be nice to ferret these away into a label, skipping the inbox.
Using a filter is of course the right choice.
The desired identifier is the string [RELEASE] at the beginning of a subject line.
Attempting to specify any of the following regexes causes emails with the string release in any case anywhere in the subject line to be matched:
\[RELEASE\]*
^\[RELEASE\]
^\[RELEASE\]*
^\[RELEASE\].*
From what I've read subsequently, Gmail doesn't have standard regex support, and from experimentation it seems, as with google search, special characters are simply ignored.
I'm therefore looking for a search parameter which can be used, maybe something like atstart:mystring in keeping with their has:, in: notations.
Is there a way to force the match only if it occurs at the start of the line, and only in the case where square brackets are included?
Sincere thanks.
Regex is not on the list of search features, and it was on (more or less, as Better message search functionality (i.e. Wildcard and partial word search)) the list of pre-canned feature requests, so the answer is "you cannot do this via the Gmail web UI" :-(
There are no current Labs features which offer this. SIEVE filters would be another way to do this, that too was not supported, there seems to no longer be any definitive statement on SIEVE support in the Gmail help.
Updated for link rot The pre-canned list of feature requests was, er canned, the original is on archive.org dated 2012, now you just get redirected to a dumbed down page telling you how to give feedback. Lack of SIEVE support was covered in answer 78761 Does Gmail support all IMAP features?, since some time in 2015 that answer silently redirects to the answer about IMAP client configuration, archive.org has a copy dated 2014.
With the current search facility brackets of any form () {} [] are used for grouping, they have no observable effect if there's just one term within. Using (aaa|bbb) and [aaa|bbb] are equivalent and will both find words aaa or bbb. Most other punctuation characters, including \, are treated as a space or a word-separator, + - : and " do have special meaning though, see the help.
As of 2016, only the form "{term1 term2}" is documented for this, and is equivalent to the search "term1 OR term2".
You can do regex searches on your mailbox (within limits) programmatically via Google docs: http://www.labnol.org/internet/advanced-gmail-search/21623/ has source showing how it can be done (copy the document, then Tools > Script Editor to get the complete source).
You could also do this via IMAP as described here:
Python IMAP search for partial subject
and script something to move messages to different folder. The IMAP SEARCH verb only supports substrings, not regex (Gmail search is further limited to complete words, not substrings), further processing of the matches to apply a regex would be needed.
For completeness, one last workaround is: Gmail supports plus addressing, if you can change the destination address to youraddress+jenkinsrelease#gmail.com it will still be sent to your mailbox where you can filter by recipient address. Make sure to filter using the full email address to:youraddress+jenkinsrelease#gmail.com. This is of course more or less the same thing as setting up a dedicated Gmail address for this purpose :-)
Using Google Apps Script, you can use this function to filter email threads by a given regex:
function processInboxEmailSubjects() {
var threads = GmailApp.getInboxThreads();
for (var i = 0; i < threads.length; i++) {
var subject = threads[i].getFirstMessageSubject();
const regex = /^\[RELEASE\]/; //change this to whatever regex you want, this one should cover OP's scenario
let isAtLeast40 = regex.test(subject)
if (isAtLeast40) {
Logger.log(subject);
// Now do what you want to do with the email thread. For example, skip inbox and add an already existing label, like so:
threads[i].moveToArchive().addLabel("customLabel")
}
}
}
As far as I know, unfortunately there isn't a way to trigger this with every new incoming email, so you have to create a time trigger like so (feel free to change it to whatever interval you think best):
function createTrigger(){ //you only need to run this once, then the trigger executes the function every hour in perpetuity
ScriptApp.newTrigger('processInboxEmailSubjects').timeBased().everyHours(1).create();
}
The only option I have found to do this is find some exact wording and put that under the "Has the words" option. Its not the best option, but it works.
I was wondering how to do this myself; it seems Gmail has since silently implemented this feature. I created the following filter:
Matches: subject:([test])
Do this: Skip Inbox
And then I sent a message with the subject
[test] foo
And the message was archived! So it seems all that is necessary is to create a filter for the subject prefix you wish to handle.

preg match email and name from to

i want to find name and email from following formats (also if you know any other format that been getting use in mail application for sending emails, please tell in comment :))
how can i know name and email for following format strings (its one string and can be in any following format):
- jon435#hotmail.com
- james jon435#hotmail.com
- "James Jordan" <jon435#hotmail.com> (gmail format)
- janne - jon44#hotmail.com (possible format)
The answer is straightforward, at least for the email portion. The rest can be special-cased away.
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Proof I'm not insane.
If you only have those strings, it is going to require more work than a simple regular expression. For instance, your first example doesn't include the full name, it is only the e-mail, thus, you would have to use the Microsoft Live ID API to retrieve that information...and that turns out to be really hard.
What exactly are you trying to do? Perhaps there is another way?