is using std::regex for simple RX is good practice?

is using std::regex for simple RX is good practice? - c++

For example, my situation:
I'm getting an input of "0", "1", "true" or "false". (in any case)
what is preferred on terms of performance, code reading, any basically best-practice:
bool func(string param)
{
string lowerCase = param;
to_lower(lowerCase);
if (lowerCase == "0" || lowerCase == "false")
{
return false;
}
if (lowerCase == "1" || lowerCase == "true")
{
return true;
}
throw ....
}
or:
bool func(string param)
{
string lowerCase = param;
to_lower(lowerCase);
regex rxTrue ("1|true");
regex rxFalse ("0|false");
if (regex_match(lowerCase, rxTrue)
{
return true;
}
if (regex_match(lowerCase, rxFalse)
{
return false;
}
throw ....
}

The second is somewhat clearer, and easier to extend (e.g.: accepting
"yes" and "no", or prefixes, with "1|t(?:rue)?)" and
"0|f(?:alse)?". With regards to performance, the second can (and
should) be made significantly faster by declaring the regex static
(and const, while you're at it), e.g.:
static regex const rxTrue ( "1|true" , regex_constants::icase );
static regex const rxFalse( "0|false", regex_constants::icase );
Note too that by specifying case insensitivity, you'll not have to
convert the input to lower case.

It's just a hunch, but probably the first one is going to be faster (no regex-compiling involved). Also, the second version depends on your compiler supporting the C++11 <regex> implementation, so depending on the environments you need to support, the second option is ruled out automatically.

Related

Regex.IsMatch for only letters and numbers [duplicate]

How can I validate a string using Regular Expressions to only allow alphanumeric characters in it?
(I don't want to allow for any spaces either).

In .NET 4.0 you can use LINQ:
if (yourText.All(char.IsLetterOrDigit))
{
//just letters and digits.
}
yourText.All will stop execute and return false the first time char.IsLetterOrDigit reports false since the contract of All cannot be fulfilled then.
Note! this answer do not strictly check alphanumerics (which typically is A-Z, a-z and 0-9). This answer allows local characters like åäö.
Update 2018-01-29
The syntax above only works when you use a single method that has a single argument of the correct type (in this case char).
To use multiple conditions, you need to write like this:
if (yourText.All(x => char.IsLetterOrDigit(x) || char.IsWhiteSpace(x)))
{
}

Use the following expression:
^[a-zA-Z0-9]*$
ie:
using System.Text.RegularExpressions;
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(SomeString)) {
...
}

You could do it easily with an extension function rather than a regex ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
for (int i = 0; i < str.Length; i++)
{
if (!(char.IsLetter(str[i])) && (!(char.IsNumber(str[i]))))
return false;
}
return true;
}
Per comment :) ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
return (str.ToCharArray().All(c => Char.IsLetter(c) || Char.IsNumber(c)));
}

While I think the regex-based solution is probably the way I'd go, I'd be tempted to encapsulate this in a type.
public class AlphaNumericString
{
public AlphaNumericString(string s)
{
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(s))
{
value = s;
}
else
{
throw new ArgumentException("Only alphanumeric characters may be used");
}
}
private string value;
static public implicit operator string(AlphaNumericString s)
{
return s.value;
}
}
Now, when you need a validated string, you can have the method signature require an AlphaNumericString, and know that if you get one, it is valid (apart from nulls). If someone attempts to pass in a non-validated string, it will generate a compiler error.
You can get fancier and implement all of the equality operators, or an explicit cast to AlphaNumericString from plain ol' string, if you care.

I needed to check for A-Z, a-z, 0-9; without a regex (even though the OP asks for regex).
Blending various answers and comments here, and discussion from https://stackoverflow.com/a/9975693/292060, this tests for letter or digit, avoiding other language letters, and avoiding other numbers such as fraction characters.
if (!String.IsNullOrEmpty(testString)
&& testString.All(c => Char.IsLetterOrDigit(c) && (c < 128)))
{
// Alphanumeric.
}

^\w+$ will allow a-zA-Z0-9_
Use ^[a-zA-Z0-9]+$ to disallow underscore.
Note that both of these require the string not to be empty. Using * instead of + allows empty strings.

Same answer as here.
If you want a non-regex ASCII A-z 0-9 check, you cannot use char.IsLetterOrDigit() as that includes other Unicode characters.
What you can do is check the character code ranges.
48 -> 57 are numerics
65 -> 90 are capital letters
97 -> 122 are lower case letters
The following is a bit more verbose, but it's for ease of understanding rather than for code golf.
public static bool IsAsciiAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
{
return false;
}
for (int i = 0; i < str.Length; i++)
{
if (str[i] < 48) // Numeric are 48 -> 57
{
return false;
}
if (str[i] > 57 && str[i] < 65) // Capitals are 65 -> 90
{
return false;
}
if (str[i] > 90 && str[i] < 97) // Lowers are 97 -> 122
{
return false;
}
if (str[i] > 122)
{
return false;
}
}
return true;
}

In order to check if the string is both a combination of letters and digits, you can re-write #jgauffin answer as follows using .NET 4.0 and LINQ:
if(!string.IsNullOrWhiteSpace(yourText) &&
yourText.Any(char.IsLetter) && yourText.Any(char.IsDigit))
{
// do something here
}

Based on cletus's answer you may create new extension.
public static class StringExtensions
{
public static bool IsAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
Regex r = new Regex("^[a-zA-Z0-9]*$");
return r.IsMatch(str);
}
}

While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphaNumeric() will return True whereas "smith 23".IsAlphaNumeric(false) will return False. By default the .IsAlphaNumeric() method ignores spaces, but it can also be overridden as shown above. If you want to allow spaces such that "smith 23".IsAlphaNumeric() will return True, simple default the arg.
Every other check in the rest of the code is simply MyString.IsAlphaNumeric().

12 years and 7 months later, if anyone comes across this article nowadays.
Compiled RegEx actually has the best performance in .NET 5 and .NET 6
Please look at the following link where I compare several different answers given on this question. Mainly comparing Compiled RegEx, For-Loops, and Linq Predicates: https://dotnetfiddle.net/WOPQRT
Notes:
As stated, this method is only faster in .NET 5 and .NET 6.
.NET Core 3.1 and below show RegEx being the slowest.
Regardless of the version of .NET, the For-Loop method is consistently faster than the Linq Predicate.

I advise to not depend on ready made and built in code in .NET framework , try to bring up new solution ..this is what i do..
public bool isAlphaNumeric(string N)
{
bool YesNumeric = false;
bool YesAlpha = false;
bool BothStatus = false;
for (int i = 0; i < N.Length; i++)
{
if (char.IsLetter(N[i]) )
YesAlpha=true;
if (char.IsNumber(N[i]))
YesNumeric = true;
}
if (YesAlpha==true && YesNumeric==true)
{
BothStatus = true;
}
else
{
BothStatus = false;
}
return BothStatus;
}

Is there a way of doing a "post switch" like operation with bool?

I have a condition like the following where I just want to have the second bool be the trigger for a single time, since this condition is invoked relatively often I don't like the idea of doing the assignment of it being false every time the condition is true so, I tried to take advantage of the order of logical AND and OR and the post increment operator. But it appears to work don't do what I expected it to do. So is there a way to make a post state switch for this line?
where firstTitleNotSet is:
bool firstTitleNotSet;
if (titleChangedSinceLastGet() || (p_firstTitleNotSet && p_firstTitleNotSet++))
The idea is that the first part is the primary trigger and the second is the trigger that only has to trigger the first time.
While I easily could do
if (titleChangedSinceLastGet() || p_firstTitleNotSet)
{
firstTitleNotSet = false;
//...
}
I don't like this as it is reassigning false when ever the conditional block is invoked.
So is there some way of "post change" the value of a bool from true to false? I know that this would work the other way around but this would negate the advantage of the method most time being the true trigger and therefor skipping the following check.
Note: The reasons for me making such considerations isntead of just taking the second case is, that this block will be called frequently so I'm looking to optimize its consumed runtime.

Well, you could do something like:
if (titleChangedSinceLastGet() ||
(p_firstTitleNotSet ? ((p_firstTitleNotSet=false), true):false))
An alternative syntax would be:
if (titleChangedSinceLastGet() ||
(p_firstTitleNotSet && ((p_firstTitleNotSet=false), true)))
Either one looks somewhat ugly. Note, however, that this is NOT the same as your other alternative:
if (titleChangedSinceLastGet() || p_firstTitleNotSet)
{
p_firstTitleNotSet = false;
//...
}
With your proposed alternative, pontificate the fact that p_firstTitleNotSet gets reset to false no matter what, even if the conditional was entered because titleChangedSinceLastGet().

A more readable way than the assignment inside a ternary operator inside an or inside an if would be just moving the operations to their own statements:
bool needsUpdate = titleChangedSinceLastGet();
if(!needsUpdate && firstTitleSet)
{
needsUpdate = true;
firstTitleSet = false;
}
if(needsUpdate)
{
//...
}
This is likely to produce very similar assembly than the less readable alternative proposed since ternary operators are mostly just syntactic sugar around if statements.
To demonstrate this I gave GCC Explorer the following code:
extern bool first;
bool changed();
int f1()
{
if (changed() ||
(first ? ((first=false), true):false))
return 1;
return 0;
}
int f2()
{
bool b = changed();
if(!b && first)
{
b = true;
first = false;
}
return b;
}
and the generated assembly had only small differences in the generated assembly after optimizations. Certainly have a look for yourself.
I maintain, however, that this is highly unlikely to make a noticeable difference in performance and that this is more for interest's sake.
In my opinion:
if(titleChangedSinceLastUpdate() || firstTitleSet)
{
firstTitleSet = false;
//...
}
is an (at least) equally good option.
You can compare the assembly of the above functions with this one to compare further.
bool f3()
{
if(changed() || first)
{
first = false;
return true;
}
return false;
}

In this kind of situation, I usually write:
bool firstTitleNotSet = true;
if (titleChangedSinceLastGet() || firstTitleNotSet)
{
if (firstTileNotSet) firstTitleNotSet = false;
//...
}
That second comparison will likely be optimized by the compiler.
But if you have a preference for a post-increment operator:
int iterationCount = 0;
if (titleChangedSinceLastGet() || iterationCount++ != 0)
{
//...
}
Note that this will be a problem if iterationCount overflows, but the same is true of the bool firstTitleNotSet that you were post-incrementing.
In terms of code readability and maintainability, I would recommend the former. If the logic of your code is sound, you can probably rely on the compiler to do a very good job optimizing it, even if it looks inelegant to you.

That should work:
int firstTitleSet = 0;
if (titleChangedSinceLastGet() || (!firstTitleSet++))
If you wish to avoid overflow you can do:
int b = 1;
if (titleChangedSinceLastGet() || (b=b*2%4))
at the first iteration b=2 while b=0 at the rest of them.

Is this the right way to use recursion?

Given strings s and t compute recursively, if t is contained in s return true.
Example: bool find("Names Richard", "Richard") == true;
I have written the code below, but I'm not sure if its the right way to use recursion in C++; I just learned recursion today in class.
#include <iostream>
using namespace std;
bool find(string s, string t)
{
if (s.empty() || t.empty())
return false;
int find = static_cast<int>(s.find(t));
if (find > 0)
return true;
}
int main()
{
bool b = find("Mississippi", "sip");
string s;
if (b == 1) s = "true";
else
s = "false";
cout << s;
}
If anyone find an error in my code, please tell me so I can fix it or where I can learn/read more about this topic. I need to get ready for a test on recursion on this Wednesday.

The question has changed since I wrote my answer.
My comments are on the code that looked like this (and could recurse)...
#include <iostream>
using namespace std;
bool find(string s, string t)
{
if (s.empty() || t.empty())
return false;
string start = s.substr(0, 2);
if (start == t && find(s.substr(3), t));
return true;
}
int main()
{
bool b = find("Mississippi", "sip");
string s;
if (b == 1) s = "true";
else
s = "false";
cout << s;
}
Watch out for this:
if (start == t && find(s.substr(3), t));
return true;
This does not do what you think it does.
The ; at the end of the if-statement leaves an empty body. Your find() function will return true regardless of the outcome of that test.
I recommend you turn up the warning levels on your compiler to catch this kind of issue before you have to debug it.
As an aside, I find using braces around every code-block, even one-line blocks, helps me avoid this kind of mistake.
There are other errors in your code, too. Removing the magic numbers 2 and 3 from find() will encourage you to think about what they represent and point you on the right path.
How would you expect start == t && find(s.substr(3), t) to work? If you can express an algorithm in plain English (or your native tongue), you have a much higher chance of being able to express it in C++.
Additionally, I recommend adding test cases that should return false (such as find("satsuma", "onion")) to ensure that your code works as well as calls that should return true.
The last piece of advice is stylistic, laying your code out like this will make the boolean expression that you are testing more obvious without resorting to a temporary and comparing to 1:
int main()
{
std::string s;
if (find("Mississippi", "sip"))
{
s = "true";
}
else
{
s = "false";
}
std::cout << s << std::endl;
}
Good luck with your class!

Your recursive function needs 2 things:
Definite conditions of failure and success (may be more than 1)
a call of itself to process a simpler version of the problem (getting closer to the answer).
Here's a quick analysis:
bool find(string s, string t)
{
if (s.empty() || t.empty()) //definite condition of failure. Good
return false;
string start = s.substr(0, 2);
if (start == t && find(s.substr(3), t)); //mixed up definition of success and recursive call
return true;
}
Try this instead:
bool find(string s, string t)
{
if (s.empty() || t.empty()) //definite condition of failure. Done!
return false;
string start = s.substr(0, 2);
if (start == t) //definite condition of success. Done!
return true;
else
return find(s.substr(3), t) //simply the problem and return whatever it finds
}

You're on the right lines - so long as the function calls itself you can say that it's recursive - but even the most simple testing should tell you that your code doesn't work correctly. Change "sip" to "sipx", for example, and it still outputs true. Have you compiled and run this program? Have you tested it with various different inputs?

You are not using recursion. Using std::string::find in your function feels like cheating (this will most likely not earn points).
The only reasonable interpretation of the task is: Check if t is an infix of s without using loops or string functions.
Let's look at the trivial case: Epsilon (the empty word) is an infix of ever word, so if t.empty() holds, you must return true.
Otherwise you have two choices to make:
t might be a prefix of s which is simple to check using recursion; simply check if the first character of t equals the first character of s and call isPrefix with the remainder of the strings. If this returns true, you return true.
Otherwise you pop the first character of s (and not of t) and proceed recursively (calling find this time).
If you follow this recipe (which btw. is easier to implement with char const* than with std::string if you ask me) you get a recursive function that only uses conditionals and no library support.
Note: this is not at all the most efficient implementation, but you didn't ask for efficiency but for a recursive function.

Converting from a std::string to bool

What is the best way to convert a std::string to bool? I am calling a function that returns either "0" or "1", and I need a clean solution for turning this into a boolean value.

I am surprised that no one mentioned this one:
bool b;
istringstream("1") >> b;
or
bool b;
istringstream("true") >> std::boolalpha >> b;

bool to_bool(std::string const& s) {
return s != "0";
}

It'll probably be overkill for you, but I'd use boost::lexical_cast
boost::lexical_cast<bool>("1") // returns true
boost::lexical_cast<bool>("0") // returns false

Either you care about the possibility of an invalid return value or you don't. Most answers so far are in the middle ground, catching some strings besides "0" and "1", perhaps rationalizing about how they should be converted, perhaps throwing an exception. Invalid input cannot produce valid output, and you shouldn't try to accept it.
If you don't care about invalid returns, use s[0] == '1'. It's super simple and obvious. If you must justify its tolerance to someone, say it converts invalid input to false, and the empty string is likely to be a single \0 in your STL implementation so it's reasonably stable. s == "1" is also good, but s != "0" seems obtuse to me and makes invalid => true.
If you do care about errors (and likely should), use
if ( s.size() != 1
|| s[0] < '0' || s[0] > '1' ) throw input_exception();
b = ( s[0] == '1' );
This catches ALL errors, it's also bluntly obvious and simple to anyone who knows a smidgen of C, and nothing will perform any faster.

There is also std::stoi in c++11:
bool value = std::stoi(someString.c_str());

DavidL's answer is the best, but I find myself wanting to support both forms of boolean input at the same time. So a minor variation on the theme (named after std::stoi):
bool stob(std::string s, bool throw_on_error = true)
{
auto result = false; // failure to assert is false
std::istringstream is(s);
// first try simple integer conversion
is >> result;
if (is.fail())
{
// simple integer failed; try boolean
is.clear();
is >> std::boolalpha >> result;
}
if (is.fail() && throw_on_error)
{
throw std::invalid_argument(s.append(" is not convertable to bool"));
}
return result;
}
This supports "0", "1", "true", and "false" as valid inputs. Unfortunately, I can't figure out a portable way to also support "TRUE" and "FALSE"

I'd use this, which does what you want, and catches the error case.
bool to_bool(const std::string& x) {
assert(x == "0" || x == "1");
return x == "1";
}

Write a free function:
bool ToBool( const std::string & s ) {
return s.at(0) == '1';
}
This is about the simplest thing that might work, but you need to ask yourself:
what should an empty string return? the version above throws an exception
what should a character other than '1' or '0' convert to?
is a string of more than one character a valid input for the function?
I'm sure there are others - this is the joy of API design!

I'd change the ugly function that returns this string in the first place. That's what bool is for.

Here's a way similar to Kyle's except it handles the leading zeroes and stuff:
bool to_bool(std::string const& s) {
return atoi(s.c_str());
}

You could always wrap the returned string in a class that handles the concept of boolean strings:
class BoolString : public string
{
public:
BoolString(string const &s)
: string(s)
{
if (s != "0" && s != "1")
{
throw invalid_argument(s);
}
}
operator bool()
{
return *this == "1";
}
}
Call something like this:
BoolString bs(func_that_returns_string());
if (bs) ...;
else ...;
Which will throw invalid_argument if the rule about "0" and "1" is violated.

If you need "true" and "false" string support consider Boost...
BOOST_TEST(convert<bool>( "true", cnv(std::boolalpha)).value_or(false) == true);
BOOST_TEST(convert<bool>("false", cnv(std::boolalpha)).value_or( true) == false);
BOOST_TEST(convert<bool>("1", cnv(std::noboolalpha)).value_or(false) == true);
BOOST_TEST(convert<bool>("0", cnv(std::noboolalpha)).value_or( true) == false);
https://www.boost.org/doc/libs/1_71_0/libs/convert/doc/html/boost_convert/converters_detail/stream_converter.html

Try this:
bool value;
if(string == "1")
value = true;
else if(string == "0")
value = false;

bool to_bool(std::string const &string) {
return string[0] == '1';
}

Evaluating expressions inside C++ strings: "Hi ${user} from ${host}"

I'm looking for a clean C++ way to parse a string containing expressions wrapped in ${} and build a result string from the programmatically evaluated expressions.
Example: "Hi ${user} from ${host}" will be evaluated to "Hi foo from bar" if I implement the program to let "user" evaluate to "foo", etc.
The current approach I'm thinking of consists of a state machine that eats one character at a time from the string and evaluates the expression after reaching '}'. Any hints or other suggestions?
Note: boost:: is most welcome! :-)
Update Thanks for the first three suggestions! Unfortunately I made the example too simple! I need to be able examine the contents within ${} so it's not a simple search and replace. Maybe it will say ${uppercase:foo} and then I have to use "foo" as a key in a hashmap and then convert it to uppercase, but I tried to avoid the inner details of ${} when writing the original question above... :-)

#include <iostream>
#include <conio.h>
#include <string>
#include <map>
using namespace std;
struct Token
{
enum E
{
Replace,
Literal,
Eos
};
};
class ParseExp
{
private:
enum State
{
State_Begin,
State_Literal,
State_StartRep,
State_RepWord,
State_EndRep
};
string m_str;
int m_char;
unsigned int m_length;
string m_lexme;
Token::E m_token;
State m_state;
public:
void Parse(const string& str)
{
m_char = 0;
m_str = str;
m_length = str.size();
}
Token::E NextToken()
{
if (m_char >= m_length)
m_token = Token::Eos;
m_lexme = "";
m_state = State_Begin;
bool stop = false;
while (m_char <= m_length && !stop)
{
char ch = m_str[m_char++];
switch (m_state)
{
case State_Begin:
if (ch == '$')
{
m_state = State_StartRep;
m_token = Token::Replace;
continue;
}
else
{
m_state = State_Literal;
m_token = Token::Literal;
}
break;
case State_StartRep:
if (ch == '{')
{
m_state = State_RepWord;
continue;
}
else
continue;
break;
case State_RepWord:
if (ch == '}')
{
stop = true;
continue;
}
break;
case State_Literal:
if (ch == '$')
{
stop = true;
m_char--;
continue;
}
}
m_lexme += ch;
}
return m_token;
}
const string& Lexme() const
{
return m_lexme;
}
Token::E Token() const
{
return m_token;
}
};
string DoReplace(const string& str, const map<string, string>& dict)
{
ParseExp exp;
exp.Parse(str);
string ret = "";
while (exp.NextToken() != Token::Eos)
{
if (exp.Token() == Token::Literal)
ret += exp.Lexme();
else
{
map<string, string>::const_iterator iter = dict.find(exp.Lexme());
if (iter != dict.end())
ret += (*iter).second;
else
ret += "undefined(" + exp.Lexme() + ")";
}
}
return ret;
}
int main()
{
map<string, string> words;
words["hello"] = "hey";
words["test"] = "bla";
cout << DoReplace("${hello} world ${test} ${undef}", words);
_getch();
}
I will be happy to explain anything about this code :)

How many evaluation expressions do intend to have? If it's small enough, you might just want to use brute force.
For instance, if you have a std::map<string, string> that goes from your key to its value, for instance user to Matt Cruikshank, you might just want to iterate over your entire map and do a simple replace on your string of every "${" + key + "}" to its value.

Boost::Regex would be the route I'd suggest. The regex_replace algorithm should do most of your heavy lifting.

If you don't like my first answer, then dig in to Boost Regex - probably boost::regex_replace.

How complex can the expressions get? Are they just identifiers, or can they be actual expressions like "${numBad/(double)total*100.0}%"?

Do you have to use the ${ and } delimiters or can you use other delimiters?
You don't really care about parsing. You just want to generate and format strings with placeholder data in it. Right?
For a platform neutral approach, consider the humble sprintf function. It is the most ubiquitous and does what I am assuming that you need. It works on "char stars" so you are going to have to get into some memory management.
Are you using STL? Then consider the basic_string& replace function. It doesn't do exactly what you want but you could make it work.
If you are using ATL/MFC, then consider the CStringT::Format method.

If you are managing the variables separately, why not go the route of an embeddable interpreter. I have used tcl in the past, but you might try lua which is designed for embedding. Ruby and Python are two other embeddable interpreters that are easy to embed, but aren't quite as lightweight. The strategy is to instantiate an interpreter (a context), add variables to it, then evaluate strings within that context. An interpreter will properly handle malformed input that could lead to security or stability problems for your application.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

is using std::regex for simple RX is good practice? - c++

It's just a hunch, but probably the first one is going to be faster (no regex-compiling involved). Also, the second version depends on your compiler supporting the C++11 <regex> implementation, so depending on the environments you need to support, the second option is ruled out automatically.

Related

Regex.IsMatch for only letters and numbers [duplicate]

Is there a way of doing a "post switch" like operation with bool?

Is this the right way to use recursion?

Converting from a std::string to bool

Evaluating expressions inside C++ strings: "Hi ${user} from ${host}"

Categories

Resources