Predicate for STL algorithms - c++

I have the following question:
find_if(s.begin(), s.end(), isalpha);
s is a library string. when I try to use isalpha (in "cctype" header), it saids, "type doesn't match". The problem is that isalpha takes a int and return a int: int isalpha(int)
I solved it by declaring another function:
bool IsAlpha(char c) {
return isalpha(c);
}
However, is there any better way to do this? I would prefer better code clarity & simplicity, without declaring this "wrapper" function.
Thanks!

I suppose the 'proper' C++ way is to use the isalpha defined in locale:
std::find_if(
s.begin(),
s.end(),
[](char c) { return std::isalpha(c, std::locale()); }
);
A little verbose maybe.

Only way, that I can imagine is, that you use using namespace std, since you wrote find_if, not std::find_if and in this case you has following error
Live example. You shouln't write wrapper, you can simply use ::isalpha Live example or you can bind second par to default locale, like here

Your solution looks fine to me. Except that it has a very common bug
bool IsAlpha(char c) {
return isalpha(c);
}
should be
bool IsAlpha(char c) {
return isalpha((unsigned char)c);
}
isalpha is only defined for unsigned char values (which are normally 0 to 255) and EOF (which is normally -1). Strictly speaking passing any other negative value to isalpha is undefined. What you will find in practice however is that (assuming your char type is signed) if you give your routine the character with the code 255, you will end up passing the value of -1 to isalpha which will always return false.

How about using functors?
struct IsAlpha {
bool operator() (char c) const { return /* isAlpha logic here */; }
};
//...
find_if(begin, end, IsAlpha());
Have a look at advantages of using functors with stl algorithms here

Related

!strcmp as substitute for ==

I'm working with rapidxml, so I would like to have comparisons like this in the code:
if ( searchNode->first_attribute("name")->value() == "foo" )
This gives the following warning:
comparison with string literal results in unspecified behaviour [-Waddress]
Is it a good idea to substitute it with:
if ( !strcmp(searchNode->first_attribute("name")->value() , "foo") )
Which gives no warning?
The latter looks ugly to me, but is there anything else?
You cannot in general use == to compare strings in C, since that only compares the address of the first character which is not what you want.
You must use strcmp(), but I would endorse this style:
if( strcmp(searchNode->first_attribute("name")->value(), "foo") == 0) { }
rather than using !, since that operator is a boolean operator and strcmp()'s return value is not boolean. I realize it works and is well-defined, I just consider it ugly and confused.
Of course you can wrap it:
#include <stdbool.h>
static bool first_attrib_name_is(const Node *node, const char *string)
{
return strcmp(node->first_attribute("name")->value(), string) == 0;
}
then your code becomes the slightly more palatable:
if( first_attrib_name_is(searchNode, "foo") ) { }
Note: I use the bool return type, which is standard from C99.
If the value() returns char* or const char*, you have little choice - strcmp or one of its length-limiting alternatives is what you need. If value() can be changed to return std::string, you could go back to using ==.
When comparing char* types with "==" you just compare the pointers. Use the C++ string type if you want to do the comparison with "=="
You have a few options:
You can use strcmp, but I would recommend wrapping it. e.g.
bool equals(const char* a, const char* b) {
return strcmp(a, b) == 0;
}
then you could write: if (equals(searchNode->first_attribute("name")->value(), "foo"))
You can convert the return value to a std::string and use the == operator
if (std::string(searchNode->first_attribute("name")->value()) == "foo")
That will introduce a string copy operation which, depending on context, may be undesirable.
You can use a string reference class. The purpose of a string reference class is to provide a string-like object which does not own the actual string contents. I've seen a few of these and it's simple enough to write your own, but since Boost has a string reference class, I'll use that for an example.
#include <boost/utility/string_ref.hpp>
using namespace boost;
if (string_ref(searchNode->first_attribute("name")->value()) == string_ref("foo"))

C++ integer template parameter evaluation

This question may be very straightforward but I am rather inexperienced with c++ and got stuck while writing a simple parser.
For some reason one of the string comparison functions would not return the expected value when called.
The function looks like this:
template<int length>
bool Parser::compare(const char *begin, const char *str){
int i = 0;
while(i != length && compareCaseInsensitive(*begin, *str)){
i++;
begin++;
str++;
}
return i == length;
};
The purpose of this function was to compare a runtime character buffer with a compile time constant string vb
compare<4>(currentByte, "<!--");
I know there are more efficient ways to compare a fixed length character buffer (and used one later on) but I was rather puzzled when I ran this function and it always returns false, even with two identical strings.
I checked with the debugger and checked the value of i at the end of the loop and it was equal to the value of the template parameter but still the return expression evaluated to false.
Are there any special rules about working with int template parameters ?
I assumed the template parameter would behave like a compile time constant.
I don't know if this is relevant but I'm running gcc's g++ compiler and debugged with gdb.
If anyone could tell me what might cause this problem it would be highly appreciated.
The functions used in this piece of code:
template<typename Character>
Character toLowerCase(Character c){
return c > 64 && c < 91 ? c | 0x10 : c;
};
template<typename Character>
bool equalsCaseInsensitive(Character a, Character b){
return toLowerCase(a) == toLowerCase(b);
};
For doing case-insensitive string comparisons, I would try using the STL function std::strcoll from the header <cstring> which has signature
int strcoll( const char* lhs, const char* rhs );
and compares two null-terminated byte strings according to the current locale. Or if you want to roll your own, you could still use std::tolower from the header <cctype> which has signature
int tolower( int ch );
and converts the given character to lowercase according to the character conversion rules defined by the currently installed C locale.

Using templates for implementing a generic string parser

I am trying to come up with a generic solution for parsing strings (with a given format). For instance, I would like to be able to parse a string containing a list of numeric values (integers or floats) and return a std::vector. This is what I have so far:
template<typename T, typename U>
T parse_value(const U& u) {
throw std::runtime_error("no parser available");
}
template<typename T>
std::vector<T> parse_value(const std::string& s) {
std::vector<std::string> parts;
boost::split(parts, s, boost::is_any_of(","));
std::vector<T> res;
std::transform(parts.begin(), parts.end(), std::back_inserter(res),
[](const std::string& s) { return boost::lexical_cast<T>(s); });
return res;
}
Additionally, I would like to be able to parse strings containing other type of values. For instance:
struct Foo { /* ... */ };
template<>
Foo parse_value(const std::string& s) {
/* parse string and return a Foo object */
}
The reason to maintain a single "hierarchy" of parse_value functions is because, sometimes, I want to parse an optional value (which may exist or not), using boost::optional. Ideally, I would like to have just a single parse_optional_value function that would delegate on the corresponding parse_value function:
template<typename T>
boost::optional<T> parse_optional_value(const boost::optional<std::string>& s) {
if (!s) return boost::optional<T>();
return boost::optional<T>(parse_value<T>(*s));
}
So far, my current solution does not work (the compiler cannot deduce the exact function to use). I guess the problem is that my solution relies on deducing the template value based on the return type of parse_value functions. I am not really sure how to fix this (or even whether it is possible to fix it, since the design approach could just be totally flawed). Does anyone know a way to solve what I am trying to do? I would really appreciate if you could just point me to a possible way to address the issues that I am having with my current implementation. BTW, I am definitely open to completely different ideas for solving this problem too.
You cannot overload functions based on return value [1]. This is precisely why the standard IO library uses the construct:
std::cin >> a >> b;
which may not be your piece of cake -- many people don't like it, and it is truly not without its problems -- but it does a nice job of providing a target type to the parser. It also has the advantage over a static parse<X>(const std::string&) prototype that it allows for chaining and streaming, as above. Sometimes that's not needed, but in many parsing contexts it is essential, and the use of operator>> is actually a pretty cool syntax. [2]
The standard library doesn't do what would be far and away the coolest thing, which is to skip string constants scanf style and allow interleaved reading.
vector<int> integers;
std::cin >> "[" >> interleave(integers, ",") >> "]";
However, that could be defined. (Possibly it would be better to use an explicit wrapper around the string literals, but actually I prefer it like that; but if you were passing a variable you'd want to use a wrapper).
[1] With the new auto declaration, the reason for this becomes even clearer.
[2] IO manipulators, on the other hand, are a cruel joke. And error handling is pathetic. But you can't have everything.
Here is an example of libsass parser:
const char* interpolant(const char* src) {
return recursive_scopes< exactly<hash_lbrace>, exactly<rbrace> >(src);
}
// Match a single character literal.
// Regex equivalent: /(?:x)/
template <char chr>
const char* exactly(const char* src) {
return *src == chr ? src + 1 : 0;
}
where rules could be passed into the lex method.

C++ Strip non-ASCII Characters from string

Before you get started; yes I know this is a duplicate question and yes I have looked at the posted solutions. My problem is I could not get them to work.
bool invalidChar (char c)
{
return !isprint((unsigned)c);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
I tested this method on "Prusæus, Ægyptians," and it did nothing
I also attempted to substitute isprint for isalnum
The real problem occurs when, in another section of my program I convert string->wstring->string. the conversion balks if there are unicode chars in the string->wstring conversion.
Ref:
How can you strip non-ASCII characters from a string? (in C#)
How to strip all non alphanumeric characters from a string in c++?
Edit:
I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:
// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH
Error Dialog
MSVC++ Debug Library
Debug Assertion Failed!
Program: //myproject
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
Line: //Above
Expression:(unsigned)(c+1)<=256
Edit:
Further compounding the matter: the .txt file I am reading in from is ANSI encoded. Everything within should be valid.
Solution:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
If someone else would like to copy/paste this, I can check this question off.
EDIT:
For future reference: try using the __isascii, iswascii commands
Solution:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
EDIT:
For future reference: try using the __isascii, iswascii commands
At least one problem is in your invalidChar function. It should be:
return !isprint( static_cast<unsigned char>( c ) );
Casting a char to an unsigned is likely to give some very, very big
values if the char is negative (UNIT_MAX+1 + c). Passing such a
value toisprint` is undefined behavior.
Another solution that doesn't require defining two functions but uses anonymous functions available in C++17 above:
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), [](char c){return !(c>=0 && c <128);}), str.end());
}
I think it looks cleaner
isprint depends on the locale, so the character in question must be printable in the current locale.
If you want strictly ASCII, check the range for [0..127]. If you want printable ASCII, check the range and isprint.

How do I use overloaded functions with default arguments in algorithms?

I know the answer to the frequently-asked How do I specify a pointer to an overloaded function?: Either with assignment or with a cast, and every other C++ tutorial uppercases a string like this (give or take static_cast):
transform(in.begin(), in.end(), back_inserter(out), (int(*)(int)) std::toupper);
Or like this:
int (*fp)(int) = std::toupper;
transform(in.begin(), in.end(), back_inserter(out), fp);
Which neatly selects the <cctype> overload of std::toupper.
But this begs the question: How can I select the <locale> overload in a similar manner?
char (*fp2)(char, const std::locale&) = std::toupper;
transform(in.begin(), in.end(), back_inserter(out), fp2);
// error: too few arguments to function
Or, more practically, consider someone trying to use the C++11 std::stoi in an algorithm to convert a vector of strings to a vector of integers: stoi has two overloads (string/wstring), each taking two additional default arguments.
Assuming I don't want to explicitly bind all those defaults, I believe it is impossible to do this without wrapping such call in an auxiliary function or lambda. Is there a boost wrapper or TMP magic to do it for me in completely generic manner? Can a wrapper like call_as<char(char)>(fp2) or, more likely, call_as<int(const std::string&)>(std::stoi) even be written?
It's funny, I was doing something similar. The best way I found to do it was using lambdas as follows, because otherwise, you have to use a typedef to get the right overload and a std::bind to get rid of the locale, or not use the locale. However, this works much more cleanly:
static const std::locale loc;
transform(in.begin(), in.end(), back_inserter(out), [&loc](char c) {
return std::toupper(c, loc);
});
I use the static to save the effort of reallocating each time.
Or you could get a typedef and do:
std::bind((LocaleCompare)std::toupper, std::placeholders::_1, loc); // UGLY!
You could should create a typedef of that functon pointer type, and then cast the function.
typedef char (*LocaleToUpper)(char, const std::locale&) ;
char (*fp2)(char, const std::locale&) = (LocaleToUpper)toupper;