False positive results in email validation on mailgun - mailgun

We are doing some rather rigorous front end testing in our opt in List.
When users actually subscribed to our email list we are thinking of running email validation ( https://documentation.mailgun.com/api-email-validation.html#email-validation ) before we even save it to the database.
Does anyone know whether this built in email validation Will result in false positive?(In other words is it possible that the function will flag agood address as bad?)
Secondly, if we do have a false positive Address in our database: does mail gun in fact check it before trying to send it anyway. Will it get to the destination?

In case anyone else is searching for this....
It is highly unlikely that there will be a false positive. The APIs structure to allow permissive rather than restrictive: This means that you will not get a false positive but it's very likely that A untrue email address will be flagged as valid

You can use MailboxValidator for a second opinion. The demo page is free and you can also sign up for the free API plan if you like.

Related

email map validation in firestore

So I know that email validation is quite a difficult thing to do. I have already written a regex that checks for a valid email adress. The problem is writing the security rule, seeing as I am dealing with a map in cloud firestore. The map looks like this:
email{
work: ""
personal: ""
}
The problem is the fact that I cannot guarantee that a specific value will match the regex. Users should be able to have only a personal email, only a work email or both a personal and work email. All of the situations should result in validated email adresses in firestore.
I currently have the following code, but I can't figure out how to deal with maps like this:
match /organisations/{orgID}/people/{userID} {
allow create: if(request.resource.data.email.matches(^[A-Za-z0-9]{3,}[#]{1}[A-Za-z0-9]{3,}[.]{1}[A-Za-z0-9]{3,}$) == true);
}
Is this doable with just one security rule? If yes, how? If no, how do I manage this another way? I'd rather use security rules over writing a cloud function for this if possible.
I currently have something like this, but I get an error because firebase doesn't seem to recognize the | (OR) operator. Is there any alternative for doing this? I'm tring to test if the email is either valid or null.
match /organisations/{orgID}/people/{userID}{
allow create: if(
request.resource.data.email.work.matches(^[A-Za-z0-9.]{3,}[#]{1}[A-Za-z0-9.]{3,}[.]{1}[A-Za-z0-9.]{3,}$|"")
);
}
Thanks in advance for your help!
You have two distinct problems here. They are not directly related to each other. I will try to address them separately.
The problem is writing the security rule, seeing as I am dealing with a map in cloud firestore.
If you want to use the value of a nested field within map field, you can simply use dot notation to get to it:
request.resource.data.email.work
request.resource.data.email.personal
The problem is the fact that I cannot guarantee that a specific value will match the regex. Users should be able to have only a personal email, only a work email or both a personal and work email.
You will need to write logic to check each map field separately. You can't check all fields of the map at the same time.

AWS WAF XSS check blocking form with "ON" keyword in form field value

Posting a form with " on" or any word starting with "on" as last word in a form field resulting in an XSS block from aws waf
blocked by this rule
Body contains a cross-site scripting threat after decoding as URL
e.g. "twenty only" or " online" or "check on" all results in XSS block
These seems to be normal words, why it's getting blocked for xss?
but with whitespace at the end it doesn't block
e.g. "twenty only " or " online " or "check on " these works
Just flagging up we got started with WAF last night, and overnight a few dozen legitimate requests were blocked.
Surely enough, each XSS rule had the string "on" in the request body, followed by other characters.
I wonder if it was trying to detect the hundred or so onerror, onload and other javascript events? Feels like it could have been a lot more specific than matching on followed by "some stuff"...
Only solution here seems to be disable this rule for us - it's going to be a constant source of false positives for us otherwise, which makes it worthless.
This is a known problem with the "CrossSiteScripting_BODY" WAFv2 rule provided by AWS as part of the AWSManagedRulesCommonRuleSet ruleset. The rule will block any input that matches on*=*
In a form with multiple inputs, any text that has " on" in it will likely trigger this rule with false positive, e.g. a=three two one&b=something else
In Sept 2021, I complained to AWS Enterprise Support about this clearly broken rule and they replied "Its better to block the request when in doubt than to allow a malicious one", which I strongly disagree with. The support engineer also suggested that I could attempt to whitelist inputs which have triggered this rule, which is totally impractical for any non-trivial web app.
I believe the rule is attempting to block XSS attacks containing scripts like onerror=eval(src), see https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html#waf-bypass-strings-for-xss
I would recommend excluding all the black box CrossSiteScripting rules from your WAF, as they are not fit for purpose.
You can try upgrading to WAFv2, however certain combination with characters "on" +"&" may still cause a false positive. The rule that is causing the problem is XSS on body with URL decoding. So if your formdata is submitted using url-encoding, you could hit a problem. If you submit your form as JSON data or using MIME multipart/form-data it should work. I have 2 application, one with formdata submission with a javascript XHR using fetch api, it uses multipart/form-data and another with JSON data wasn't getting blocked.
Otherwise, you have to tune your XSS rules or set that specific rule to count. I will not post how to tune lest someone lurking here and try to be funny.
What your suggestion of adding a whitespace works as well, the backend can remove the whitespace or leave as it is. A little annoying but it works.

How can I extract the canonical email address given an address that includes BATV or other tags?

Our webapp has a feature that allows users to import data by sending emails to a specific email address. When the emails are received by our app, they are processed differently depending on who sent them. We look at the "sender" field of the email, and match it to a user in our database. Once the user who sent the email has been determined, we handle that email based on that user's personal settings.
This has generally been working fine for most users. However, certain users were complaining that their emails weren't getting processed. When we looked into it, we found that their email server was adding information to the senders email address, and this caused the email address not to match what was in our User table in the database. For example, the user's email might be testuser#example.com in the database, but the "sender" field in the email we received would be something like btv1==502867923ab==testuser#example.com. Some research suggested this was caused by Bounce Address Tag Validation (BATV) being used by the sender's server.
We need to be able to extract the canonical email address from the "sender" field provided to us, so we can match it to our user table. One of the other developers here wrote a function to do this, and submitted it to me for code review. This is what he wrote (C#):
private static string SanitizeEmailSender(string sender)
{
if (sender == null)
return null;
return System.Text.RegularExpressions.Regex.Replace(
sender,
#"^((btv1==.{11}==)|(prvs=.{9}=))",
"",
System.Text.RegularExpressions.RegexOptions.None);
}
The regex pattern here covers the specific cases we've seen in our email logs. My concern is that the regex might be too specific. Are btv1 and prvs the only prefixes used in these tags? Are there always exactly 9 characters after prvs=? Are there other email sender tagging schemes other than BATV that we need to look out for? What I don't want is to put this fix in production just to find out next month that we need to fix it again because there were other cases we didn't consider.
My gut instinct was to just trim the email address to only include the part after the last =. However, research suggests that = is a valid character in email addresses and thus may be part of the user's canonical email address. I personally have never seen = used in an email address outside some kind of tagging or sub-addressing scheme, but you never know. Murphy's law suggests that the minute I assume a user will never have a certain character in their email address, somebody with that sort of address will immediately sign up.
My question is: is there a industry-accepted reliable way to extract a user's canonical email address given a longer address that may include BATV or other tags? Failing that, is there at least a more reliable way than what we've got so far? Or is what we've got actually sufficient?
As the information added by BATV is always preceded by the BATV tag and delimiting the information between two == strings, this is what I should use:
((btv1|prvs)==([^=]|=[^=])*==))
Of course, you are right in the sense that an = sign is admitted as a valid character in an email addres, but that's preciselly the reason to use that sequence (to form a valid email address).
If you try to dig a little more in RFCs relating to email, you'll se that MIME adds some constructs to allow non-ascii characters to an email address by use of the quoted-printable feature. A little of RFC reading is needed to select how to cope right with these things.
Finally, to answer your question, as the mail servers are authorised to modify/rewrite the envelope addresses ---these are the addresses in the control protocol SMTP used for routing of mail messages--- (sendmail can do it even in the mail header fields) The right answer to your question is that there's no reliable way (industrial accepted or not) to extract the sender canonical email address. Addresses are rewritten as message progresses to the target recipient and information is lost in the way. You cannot recover the original address used.
And last, to illustrate a little:
Sender field is added by the final SMTP recipient to include in the email the address of the envelope sender (the address used as FROM: <sender#address.com> in the original SMTP protocol message)
From field is added by the original mail client to identify the origin of the message. This behaviour can be modified by the existence of Resent-from or Resent-sender fields in case the message is resent. These identify the resend of messages.
Finally, the sender can use a Reply-to header to indicate responses to be sent to that address.
To get an idea of how the SMTP protocol works, read the dense RFC-2821 (SMTP protocol) and RFC-2822 (format of internet mail messages) documents.
Are btv1 and prvs the only prefixes used in these tags?
prvs is a prefix that conform to the "meta-syntax" defined in the RFC. btv1 is a Barracuda appliance Invalid Spoof Suppression rewrite which doesn't follow the BATV standard (hence the double equal sign).
A regex that just matches all BATV local-parts would be
[0-9A-Za-z\-]+=[0-9A-Za-z\-]+=.+#.+]
But this wouldn't catch the Barracuda btv1 rewrites (and other rewrites)
Are there always exactly 9 characters after prvs=?
No, the spec says there are 10 but in the wild it's most often 9
Are there other email sender tagging schemes other than BATV that we need to look out for?
Yes, see below.
is there a industry-accepted reliable way to extract a user's canonical email address given a longer address that may include BATV or other tags?
No
By looking at various code bases it looks like everybody implements their own solution. Some of the complexity comes from the fact that there are
the BATV rewrites
BATV rewrites which try but fail to follow the standard by swapping the loc-core and tag-val positions. Here is an example showing these reversed versions and some code which validates each to see if it's a prvs value and then assumes the other one is the loc-core
the Barracuda non standard rewrites
other non BATV rewrites like
SRS
Google Forwards
Here's a unit test containing a list of possible sender rewritten examples and here are some examples of syntaxes found in the wild.
Failing that, is there at least a more reliable way than what we've got so far? Or is what we've got actually sufficient?
It looks like best approach is to address each of the conditions in the way that ezmlm-idx and rspamd do.
The regex you're using won't cover
prvs with loc-core and tag-val reversed
prvs that follow the spec with 10 characters instead of 9
SRS
Google forwards

Parameter not supported by web service

I want to validate an opinion with you.
I have to design a web service that searches into a database of restaurants affiliated to a discount program in a specific country around a given address.
The REST call to such a webservice will look like http://server/search?country=<countryCode>&language=<languageCode>&address=<address>&zipcode=<zipcode>
The problem is that some countries do not have zipcodes or do not have them in the entire country.
Now, what would you do if the user passes such a parameter for a country that does not have zipcodes, but he/she passes a valid address?
Return 400 Bad request.
Simply igonre the zipcode parameter and return results based on the valid address
Return an error message in a specific format (e.g. JSON) stating that zipcodes are not supported for that country
Some colleagues are also favoring the following option
4. Simply return no results. And state in the documentation that the zipcode parameter is not supported. Also we have to create a webservice method which returns what fields should be displayed in the user interface.
What option do you think is best and why?
Thanks!
Well the OpenStreetMap Nomination Server returns results even if you dont know the ZIP Code and you can look at the results anyway. What if the user doesnt know the zip code but wants to find hist object?
I would try to search for that specific object anyway, especially because you said that some countries have zip codes partially.
If you simply return nothing te user doesnt know what went wrong and he wont know what to do.
That would depend on the use case. How easy is it for a user of the API to trigger that case? Is it a severe error which the user really should know how to avoid? Or is it something that is not entirely clear, where a user may know (or think he knows) a zipcode where officially there shouldn't be one? Does it come down to trial and error for the user how to retrieve correct results from your API? Is it a bad enough error that the user needs to be informed about it and that he needs to handle this on his side?
If you place this restriction in your API, consider that it will have to be clearly documented when this case is triggered, every user of the API will have to read and understand that documentation, it needs to be clear how to avoid the problem, it needs to be possible for the user to avoid the problem and every user will have to correctly implement extra code on his side to avoid this problem. Is it possible for the user to easily know which areas have zipcodes and which don't?
I think the mantra of "be flexible in what you accept, strict in what you output" applies...

List of mock valid/invalid Email Addresses for Unit Tests

Does anyone know of a list of email addresses (which don’t need to be real) I can use for an email validation assembly for a unit test?
I’ve been looking for such a list and can’t seem to find one. I’m not looking for real addresses, just ones that fit, and the more things I can throw at the test the better. I’ve got 10 right now, but if there is a list, it would give me a more thorough test.
I believe you were looking for something like this:
List of Valid and Invalid Email Addresses
Check the tests of the Apache Commons EmailValidator class:
EmailTest,
EmailValidatorTest.
The EmailValidatorTest in the Hibernate Validator also contains some address.
I like to use the set in this page on email validating regexes because the addresses contain what they're testing inside the email address.
Here is a set of test emails that Dominic Sayers uses to test his isEmail validator:
http://code.iamcal.com/php/rfc822/tests/
For more on isEmail:
http://isemail.info/about