Validating email address without regex

Validating email address without regex - c++

This must have a canonical answer but I cannot find it... Using a regular expression to validate an email address has answers which show regex is really not the best way to validate emails. Searching online keeps turning up lots and lots of regex-based answers.
That question is about PHP and an answer references a handy class MailAddress. C# has something very similar but what about plain old C++? Is there a boost/C++11 utility to take all the pain away? Or something in WinAPI/MFC, even?

I have to write one solution because I have a g++ version installed that doesnt support std::regex (Application crashes) and I dont want to upgrade the thing for a single E-Mail validation as this application probably never will need any further regex I wrote a function doing the job. You can even easily scale allowed characters for each part of the E-Mail addres (before #, after # and after '.') depdending on your needs. Took 20 min to write and was way easier then messing with compiler and environment stuff just for one function call.
Here you go, have fun:
bool emailAddressIsValid(std::string _email)
{
bool retVal = false;
//Tolower cast
std::transform(_email.begin(), _email.end(), _email.begin(), ::tolower);
//Edit these to change valid characters you want to be supported to be valid. You can edit it for each section. Remember to edit the array size in the for-loops below.
const char* validCharsName = "abcdefghijklmnopqrstuvwxyz0123456789.%+_-"; //length = 41, change in loop
const char* validCharsDomain = "abcdefghijklmnopqrstuvwxyz0123456789.-"; //length = 38, changein loop
const char* validCharsTld = "abcdefghijklmnopqrstuvwxyz"; //length = 26, change in loop
bool invalidCharacterFound = false;
bool atFound = false;
bool dotAfterAtFound = false;
uint16_t letterCountBeforeAt = 0;
uint16_t letterCountAfterAt = 0;
uint16_t letterCountAfterDot = 0;
for (uint16_t i = 0; i < _email.length(); i++) {
char currentLetter = _email[i];
//Found first #? Lets mark that and continue
if (atFound == false && dotAfterAtFound == false && currentLetter == '#') {
atFound = true;
continue;
}
//Found '.' after #? lets mark that and continue
if (atFound == true && dotAfterAtFound == false && currentLetter == '.') {
dotAfterAtFound = true;
continue;
}
//Count characters before # (must be > 0)
if (atFound == false && dotAfterAtFound == false) {
letterCountBeforeAt++;
}
//Count characters after # (must be > 0)
if (atFound == true && dotAfterAtFound == false) {
letterCountAfterAt++;
}
//Count characters after '.'(dot) after # (must be between 2 and 6 characters (.tld)
if (atFound == true && dotAfterAtFound == true) {
letterCountAfterDot++;
}
//Validate characters, before '#'
if (atFound == false && dotAfterAtFound == false) {
bool isValidCharacter = false;
for (uint16_t j = 0; j < 41; j++) {
if (validCharsName[j] == currentLetter) {
isValidCharacter = true;
break;
}
}
if (isValidCharacter == false) {
invalidCharacterFound = true;
break;
}
}
//Validate characters, after '#', before '.' (dot)
if (atFound == true && dotAfterAtFound == false) {
bool isValidCharacter = false;
for (uint16_t k = 0; k < 38; k++) {
if (validCharsDomain[k] == currentLetter) {
isValidCharacter = true;
break;
}
}
if (isValidCharacter == false) {
invalidCharacterFound = true;
break;
}
}
//After '.' (dot), and after '#' (.tld)
if (atFound == true && dotAfterAtFound == true) {
bool isValidCharacter = false;
for (uint16_t m = 0; m < 26; m++) {
if (validCharsTld[m] == currentLetter) {
isValidCharacter = true;
break;
}
}
if (isValidCharacter == false) {
invalidCharacterFound = true;
break;
}
}
//Break the loop to speed up thigns if one character was invalid
if (invalidCharacterFound == true) {
break;
}
}
//Compare collected information and finalize validation. If all matches: retVal -> true!
if (atFound == true && dotAfterAtFound == true && invalidCharacterFound == false && letterCountBeforeAt >= 1 && letterCountAfterAt >= 1 && letterCountAfterDot >= 2 && letterCountAfterDot <= 6) {
retVal = true;
}
return retVal;
}

Related

String and character comparison

I am new to programming and I need to search any string to see if it includes only the letters a,b,c,d,e or f. The minute the program finds a letter that is not one of those the program should return false. Here is my function
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
return true;
}
else {
return false;
}
index++;
}
}
Thank you very much for nay help! :)

The moment the return statement is encountered, the function is exited. This means that the moment any of the characters 'a', 'b', 'c', 'd', 'e', 'f' is encountered while iterating, due to the return statement the function will be exited immediately.
You can use std::string::find_first_not_of as shown below:
std::string input = "somearbitrarystring";
std::string validChars = "abcdef";
std::size_t found = input.find_first_not_of(validChars);
if(found != std::string::npos)
std::cout << "Found nonfavorite character " <<input[found]<<" at position "<<found<< std::endl;
else
{
std::cout<<"Only favorite characters found"<<std::endl;
}

If you unroll the loop by hand, you will spot the problem immediately:
if ((word[0] == 'a') || (word[0] == 'b') || (word[0] == 'c')||
(word[0] == 'd')|| (word[0] == 'e')|| (word[0] == 'f')) {
return true;
}
else {
return false;
}
if ((word[1] == 'a') || (word[1] == 'b') || (word[1] == 'c')||
(word[1] == 'd')|| (word[1] == 'e')|| (word[1] == 'f')) {
return true;
}
else {
return false;
}
//...
That is, the return value depends only on the first element.
"The minute the program finds a letter that is not one of those the program should return false" means
if ((word[0] != 'a') || (word[0] != 'b') || (word[0] != 'c')||
(word[0] != 'd')|| (word[0] != 'e')|| (word[0] != 'f')) {
return false;
}
if ((word[1] != 'a') || (word[1] != 'b') || (word[1] != 'c')||
(word[1] != 'd')|| (word[1] != 'e')|| (word[1] != 'f')) {
return false;
}
// ...
// After checking all the characters, you know what all them were in
// your desired set, so you can return unconditionally.
return true;
or, with a loop:
while (index < length) {
if ((word[index] != 'a') || (word[index] != 'b') || (word[index] != 'c')||
(word[index] != 'd')|| (word[index] != 'e')|| (word[index] != 'f')) {
return false;
}
index++;
}
return true;

bool is_favorite(string word){
return ( word.find_first_not_of( "abcdef" ) == std::string::npos );
}
It returns true if, and only if, there are only the characters 'a' through 'f' in the string. Any other character ends the search immediately.
And if you exchange string word with const string & word, your function will not have to create a copy of each word you pass to it, but work on a read-only reference to it, improving efficiency.

bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if (word[index] > 'f' || word[index] < 'a')
return false;
index++;
}
return true;
}

The return true is logically in the wrong place in your code.
Your version returns true as soon as it finds one letter that is a through f. It's premature to conclude that the whole string is valid at that point, because there may yet be an invalid character later in the string.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
return true; // This is premature.
}
else {
return false;
}
index++;
}
}
Minimal change that illustrates where the return true should be: after the loop. The return true is reached only if and only if we did not detect any invalid characters in the loop.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
// Do nothing here
}
else {
return false;
}
index++;
}
return true;
}
Obviously now that the affirmative block of the if is empty, you could refactor a little and only check for the negative condition. The logic of it should read closely to the way you described the problem in words:
"The minute the program finds a letter that is not one of those the program should return false."
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if (!is_letter_a_through_f((word[index])
return false;
index++;
}
return true;
}
I replaced your large logical check against many characters with a function in the above code to make it more readable. I trust you do that without difficulty. My own preference is to keep statements short so that they are readable, and so that when you read the code, you can hold in your short-term memory the logic of what you are saying about control flow without being overloaded by the mechanics of your letter comparison.

Trying to print invalid email address from a list using c++?

I am trying to print a list of invalid emailaddress (which has a space and does not have a # or .) from a list of email addresses. The list has a few email addresses which have spaces, and no '#' or '.' but still it does not print anything.
//Declaring boolean variables
bool atPresent;
bool periodPresent;
bool spacePresent;
string emailid = someemailfrom a list;
atPresent = false;
periodPresent = false;
spacePresent = false;
//looking for #
size_t foundAt = emailid.find('#');
if (foundAt != string::npos) {
atPresent = true;
}
//looking for '.'
size_t foundPeriod = emailid.find('.');
if (foundPeriod != string::npos) {
periodPresent = true;
}
//looking for ' '
size_t foundSpace = emailid.find(' ');
if (foundSpace != string::npos) {
spacePresent = true;
}
//checking to see if all conditions match
if ( (atPresent == false) && (periodPresent == false) && (spacePresent == true)) {
cout << emailid << endl;
}

(atPresent == false) && (periodPresent == false) && (spacePresent == true)
Is wrong. It is only true, when all of the three criteria for an invalid adress are met. But an address is invalid as soon as at least on criteria is met. This would be
(atPresent == false) || (periodPresent == false) || (spacePresent == true)
And simplified:
!atPresent || !periodPresent || spacePresent

replace && statements by || statements : you are only printing those which doesn't have # AND have a space AND have a period. You should use a regex, so you can do it on one line, and know how to use them is always usefull when you try to validate user data

Leet Code Regular Expression Matching Problem

https://leetcode.com/problems/regular-expression-matching
I was doing this practice problem (cpp) and while faster solutions are in the comments, I would like to understand why my code isn't working. This fails with s = "mississippi" and p = "mis*is*p*.". Tracing through the code, I figured it would correctly remove the first two letters, then when seeing the s* it would go through the s in the string (two of them), then remove the i in both, remove all the s (again 2) then remove all the p's (which is none, because it's compared against the i in the first string, so it should not modify that string). Finally, the '.' would match with the first p and remove both. So the final string should be "pi" and return false when the length is compared to zero.
class Solution {
public:
bool isMatch(string s, string p) {
while (s.length() > 0){
if (p.length() == 0){
return false;
}else if (p.length() == 1){
return p.compare(s) == 0 || p.at(0) == '.';
}else{
if (p.at(1) == '*'){
char c = p.at(0);
p = p.substr(2);
if (c == '.'){
return true;
}
int spot = 0;
while(spot < s.length() && s.at(spot) == c){
spot++;
}
if (spot != 0){
s = s.substr(spot);
}
}else{
if (s.at(0) != p.at(0) && p.at(0) != '.'){
return false;
}
s = s.substr(1);
p = p.substr(1);
}
}
}
return s.length() == 0;
}
};

Your logic is faulty here
return p.compare(s) == 0 || p.at(0) == '.';
That should be
return p.compare(s) == 0 || (s.length() == 1 && p.at(0) == '.');
That took me five minutes to find, two minutes looking at the code without seeing the problem, and then three minutes using a debugger to track down the logic error. You really should learn to use a debugger, much more efficient than asking on SO.
Some tips here.

Why won't my for loop pick up the correct error?

I have a grid and have a member function named Move(int s) that is supposed to move the mover icon in whatever direction it is currently facing, 's' amount of spaces. If there is a block character ('#') anywhere in front of the mover where it wants to move to, the function is supposed to fail and leave the cursor in the correct spot. It seems that the bool statement is always equating to true but I can't seem to find where in my code.
In my sample output the move function never fails, the mover always seems to get through walls or replace walls.
I won't post all 4 direction but I will post North and West:
bool Grid::Move(int s) {
bool canMove = true; //initialize the bool variable
if (direction == NORTH) {
if ((mRow - s) >= 0) {
for (int i = mRow; i >= (mRow - s); i--) {
if (matrix[i][mCol] == '#') {
canMove = false;
} else if (matrix[i][mCol] != '#') {
canMove = true;
}
}
if (canMove == true) {
matrix[mRow][mCol] = '.';
mRow = (mRow - s);
matrix[mRow][mCol] = '^';
return true;
}else{
matrix[mRow][mCol] = '^';
}
} else
return false;
} else if (direction == WEST) {
if ((mCol - s) >= 0) {
for (int i = mCol; i >= (mCol - s); i--){
if (matrix[mRow][i] == '#'){
canMove = false;
} else if (matrix[mRow][i] != '#')
canMove = true;
}
if (canMove == true) {
matrix[mRow][mCol] = '.';
mCol = (mCol - s);
matrix[mRow][mCol] = '<';
return true;
}else
matrix[mRow][mCol] = '<';
}else
return false;
}

You're setting canMove on every iteration of your loop. Whatever value it gets on the last time thru is the value it will have.
Since the objective there is to see if the move is valid for the entire duration, you don't need to set canMove to true because once it becomes false it should stay that way. (And you can break out of your loop when that happens.)

wild card matching in text string

My friend give this wild card(*) matching algorithm . Here is the code .
//This function compares text strings, one of which can have wildcards ('*').
//
BOOL GeneralTextCompare(
char * pTameText, // A string without wildcards
char * pWildText, // A (potentially) corresponding string with wildcards
BOOL bCaseSensitive = FALSE, // By default, match on 'X' vs 'x'
char cAltTerminator = '\0' // For function names, for example, you can stop at the first '('
)
{
BOOL bMatch = TRUE;
char * pAfterLastWild = NULL; // The location after the last '*', if we’ve encountered one
char * pAfterLastTame = NULL; // The location in the tame string, from which we started after last wildcard
char t, w;
// Walk the text strings one character at a time.
while (1)
{
t = *pTameText;
w = *pWildText;
// How do you match a unique text string?
if (!t || t == cAltTerminator)
{
// Easy: unique up on it!
if (!w || w == cAltTerminator)
{
break; // "x" matches "x"
}
else if (w == '*')
{
pWildText++;
continue; // "x*" matches "x" or "xy"
}
else if (pAfterLastTame)
{
if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
{
bMatch = FALSE;
break;
}
pTameText = pAfterLastTame++;
pWildText = pAfterLastWild;
continue;
}
bMatch = FALSE;
break; // "x" doesn't match "xy"
}
else
{
if (!bCaseSensitive)
{
// Lowercase the characters to be compared.
if (t >= 'A' && t <= 'Z')
{
t += ('a' - 'A');
}
if (w >= 'A' && w <= 'Z')
{
w += ('a' - 'A');
}
}
// How do you match a tame text string?
if (t != w)
{
// The tame way: unique up on it!
if (w == '*')
{
pAfterLastWild = ++pWildText;
pAfterLastTame = pTameText;
w = *pWildText;
if (!w || w == cAltTerminator)
{
break; // "*" matches "x"
}
continue; // "*y" matches "xy"
}
else if (pAfterLastWild)
{
if (pAfterLastWild != pWildText)
{
pWildText = pAfterLastWild;
w = *pWildText;
if (!bCaseSensitive && w >= 'A' && w <= 'Z')
{
w += ('a' - 'A');
}
if (t == w)
{
pWildText++;
}
}
pTameText++;
continue; // "*sip*" matches "mississippi"
}
else
{
bMatch = FALSE;
break; // "x" doesn't match "y"
}
}
}
pTameText++;
pWildText++;
}
return bMatch;
}
This algo works as follow (according to me)
mississippi *sip*
mississippi sip*
ississippi sip*
ssissippi sip*
sissippi ip*
sissippi sip* pAfterLastWild is used to restore the location
issippi ip*
ssippi p*
ssippi sip* again pAfterLastWild is used here.
sippi ip*
sippi sip* here also.
ippi ip*
ppi p*
pi *
i *
I am not able to figure out why pAfterLastTame is needed and what does this piece of code is doing here as i am not able to find use of it .
else if (pAfterLastTame)
{
if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
{
bMatch = FALSE;
break;
}
pTameText = pAfterLastTame++;
pWildText = pAfterLastWild;
continue;
}
This algo is pretty fast as number of comparisons are equal to size of tameString (correct me i am wrong) .
Does any one know more efficient algorithm than this ??

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Validating email address without regex - c++

Related

String and character comparison

Trying to print invalid email address from a list using c++?

Leet Code Regular Expression Matching Problem

Why won't my for loop pick up the correct error?

wild card matching in text string

Categories

Resources