I've been updating a program I wrote almost two years ago, and I've come across a call to remove all punctuation and spaces from a string.
The call works alright, but I'm not sure that it's the most efficient way to do this.
The line of code is below:
tempMessage.erase(remove_if(tempMessage.begin(), tempMessage.end(), (int(*)(int))ispunct), tempMessage.end());
I've no recollection of where I came up with this or how it was put together, but I want to be able to understand this call fully.
I get that the std::string.erase gets rid of the first argument up until the second argument. I can also see how the remove_if defines the start and end points, but can anyone tell me where the third argument in the remove_if call is coming from?
I can't remember why the (int(*)(int)) is needed for the life of me.
While you are looking at the code, can anyone improve this, or make it more efficient?
Thanks
First, this doesn't work in general; it just seems to (and it
may work with some compilers). You cannot pass a char to the
one argument version of ispunct without incurring undefined
behavior.
As for the reason for the cast: the standard defines both
a single argument ispunct function and a two argument
ispunct function template. In order to correctly
instantiation the template function erase, the compiler needs
to know the exact type of ispunct. To know the exact type of
ispunct, the compiler needs to be able to do type deduction on
the function template. In order to do type deduction, the
compiler needs to know the type expected. There's a cycle in
the dependencies, which the explicite cast (or what looks like
an explicit cast) resolves.
Because using the one parameter version of ispunct results in
undefined behavior, and using the two parameter version won't
compile unless you provide the additional parameter (using
std::bind, for example), anyone doing any string processing in
C++ will have functional objects already written in his toolbox
to handle this, and would write something like:
tempMessage.erase(
std::remove_if( tempMessage.begin(), tempMessage.end(), IsPunct() ),
tempMessage.end() );
How you implement IsPunct depends on your needs with regards
to localization. The simplest version is just:
struct IsPunct
{
bool operator()( char ch ) const
{
return ::ispunct( static_cast<unsigned char>( ch ) );
}
};
The version using the ctype facet of locale is somewhat
more complicated (and you probably want it to keep a copy of the
locale, as well as a reference to the facet, just to be sure
that the referenced facet doesn't disappear).
Related
Can anyone explain the working of this function?
string rightTrim(const string &str)
{
string s(str);
s.erase(find_if(s.rbegin(), s.rend(), not1(ptr_fun<int, int>(isspace))).base(), s.end());
return s;
}
I don't know the working of not1() and ptr_fun(). Can anyone provide me with a good explanation for this code?
PS: I know, this code removes any white spaces from the end of the string.
The question is essentially
What is not1(ptr_fun<int, int>(isspace))?
Short answer
You should use std::not_fn(isspace) instead, which clearly states it is a "thing" that expresses the idea that "something is not a space".(¹)(²)
Wordy answer
It is a predicated that asks if its input is not a space: if you apply it to 'a' you get true, if you apply it to ' ', you get false.
However, one the not in the paragraph above explains the reason why the code has not1, but it doesn't say anything about ptr_fun. What is that for? Why couldn't we just write not1(isspace)?
Long story short, not1 is an old generation helper function which was deprecated in C++17 and removed in C++20. It relies on the argument that you pass to it to have a member type named argument_type, but isspace is a free function, not an object of a class providing such a member, so not1(isspace) is ill formed.
ptr_fun came to the rescue, as it can transform isspace in an object which provides the interface that not1 expects.
However, ptr_fun was deprecated even before not1, in C++11, and removed in C++17.
The bottom line is therefore that you should not use either of those: you don't need ptr_fun anymore, and you can use not_fn as a more usable alternative to not1. You can indeed just change not1(ptr_fun<int, int>(isspace)) to std::not_fn(isspace), which also reads much more like "is not a space".
(¹) By the way, stop using namespace std;. It's just the wrong thing to do.
(²) Yes, even if you have to stick to C++14, don't use std::not1. In C++14 you already have generic lambdas, so you can define a quasi-not_fn yourself (working example):
auto not_fn = [](auto const& pred){
return [&pred](auto const& x){
return !pred(x);
};
};
I was searching for a while on the net and unfortunately i didn't find an answer or a solution for my problem, in fact, let's say i have 2 functions named like this :
1) function1a(some_args)
2) function2b(some_args)
what i want to do is to write a macro that can recognize those functions when feeded with the correct parameter, just that the thing is, this parameter should be also a parameter of a C/C++ function, here is what i did so far.
#define FUNCTION_RECOGNIZER(TOKEN) function##TOKEN()
void function1a()
{
}
void function2a()
{
}
void anotherParentFunction(const char* type)
{
FUNCTION_RECOGNIZER(type);
}
clearly, the macro is recognizing "functiontype" and ignoring the argument of anotherParentFunction, i'm asking if there is/exist a trick or anything to perform this way of pasting.
thank you in advance :)
If you insist on using a macro: Skip the anotherParentFunction() function and use the macro directly instead. When called with constant strings, i.e.
FUNCTION_RECOGNIZER( "1a");
it should work.
A more C++ like solution would be to e.g use an enum, then implement anotherParentFunction() with the enum as parameter and a switch that calls the corresponding function. Of course you need to change the enum and the switch statement then every time you add a new function, but you would be more flexible in choosing the names of the functions.
There are many more solutions to achieve something similar, the question really is: What is your use case? What do want to achieve?
In 16.1.5 the standard says:
The implementation can process and skip sections of source files conditionally, include other source files, and replace macros. These capabilities are called preprocessing, because conceptually they occur before translation of the resulting translation unit.
[emphasis mine]
Originally pre-processing was done by a separate app, it is essentially an independent language.
Today, the pre-processor is often part of the compiler, but - for example - you can't see macros etc in the Clang AST tree.
The significance of this is that the pre-processor knows nothing about types or functions or arguments.
Your function definition
void anotherParentFunction(const char* type)
means nothing to the pre-processor and is completely ignored by it.
FUNCTION_RECOGNIZER(type);
this is recognized as a defined macro, but type is not a recognized pre-processor symbol so it is treated as a literal, the pre-processor does not consult the C++ parser or interact with it's AST tree.
It consults the macro definition:
#define FUNCTION_RECOGNIZER(TOKEN) function##TOKEN()
The argument, literal type, is tokenized as TOKEN. The word function is taken as a literal and copied to the result string, the ## tells the processor to copy the value of the token TOKEN literally, production functiontype in the result string. Because TOKEN isn't recognized as a macro, the ()s end the token and the () is appended as a literal to the result string.
Thus, the pre-processor substitutes
FUNCTION_RECOGNIZER(type);
with
functiontype();
So the bad news is, no there is no way to do what you were trying to do, but this may be an XY Problem and perhaps there's a solution to what you were trying to achieve instead.
For instance, it is possible to overload functions based on argument type, or to specialize template functions based on parameters, or you can create a lookup table based on parameter values.
I am trying to remove whitespaces from a string
line.erase(remove_if(line.begin(), line.end(), isspace), line.end());
But Visual Studio 2010 (C++ Express) tells me
1 IntelliSense: no instance of function template "std::remove_if" matches the argument list d:\parsertry\parsertry\calc.cpp 18
Full Source
Why is that? A simple piece of code
int main() {
string line = "hello world 111 222";
line.erase(remove_if(line.begin(), line.end(), isspace), line.end());
cout << line << endl;
getchar();
return 0;
}
Verifies the function works?
Funny thing is despite that, it runs giving correct result.
Don't question Intellisense, sometimes it's better to just ignore it. The parser or the database got screwed up somehow, so it doesn't work correctly anymore. Usually, a restart will fix the problem.
If you really want to know if the code is ill-formed, well, just hit F7 to compile.
Your source code compiles without even a warning with Visual C++ 11.0 (the compiler that ships with Visual Studio 2012).
Intellisense uses its own rules and isn't always reliable.
That said, your use of isspace is Undefined Behavior for all character sets except original 7-bit ASCII. Which means the heavily upvoted answer that you took it from, is just balderdash (which should not surprise). You need to cast the argument to (the C library's) isspace to unsigned char to avoid negative values and UB.
C99 §7.4/1 (from the N869 draft):
The header <ctype.h> declares several functions useful for testing and mapping
characters.
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.
A simple way to wrap the C function is
bool isSpace( char const c )
{
typedef unsigned char UChar;
return !!::isspace( UChar( c ) );
}
Why the typedef?
It makes the code easier to adapt when
you already have such a typedef, which is not uncommon;
it makes the code more clear; and
it avoids a C syntax cast, thereby avoiding a false positive when searching for such via a regular expression or other pattern matching.
But, why the !! (double application of the negation operator)? Considering there’s an automatic implicit conversion from int to bool? And, if one absolutely feels that the conversion should be explicit, shouldn’t it be a static_cast, and not !!?
Well, the !! avoids a silly-warning from the Visual C++ compiler,
“warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning)”
and a static_cast doesn’t stop that warning. It’s good practice to quench that warning, and since Visual C++ is the main C++ compiler on the most used system, namely Windows, better do this in all code meant to be portable.
Oh, OK, but, since the function must be wrapped anyway, then … why use the old C libary isspace (single argument) function, when the <locale> header provides a far more more flexible C++ (two arguments) isspace function?
Well, first and foremost, the old C isspace function is the one used in the question, so that’s the one discussed in this answer. I have focused on discussing just how to not do this incorrectly, that is, how to avoid Undefined Behavior. Discussing how to do it right brings it to a whole different level.
But regarding the in-practice, the C++ level function of the same name can be considered to be broken, since with g++ compilers until recently (and perhaps even with g++ 4.7.2, i haven't checked lately) only the C locale mechanism worked, and the C++ level one didn't, in Windows. It may have been fixed since g++ now supports wide streams, I don’t know. Anyway, there C library isspace function, in addition to being in-practice more portable and generally working in Windows, is also simpler and, I believe, more efficient (although for efficiency one should always MEASURE if it is deemed important!).
Thanks to James Kanze for asking (essentially) the questions above, in the comments.
What is isspace? Depending on the includes headers and the compiler
you are using, it's likely that your code won't even compile. (I don't
know about IntelliSense, but it's possible that it's looking at all of
the standard headers, and sees the ambiguity.)
There are two isspace functions in the standard, and one is a
template. Passing a function template to a template argument of another
function template does not give the compiler nearly enough information
to be able to do template argument deduction: in order to resolve the
overload of isspace, it has to know the type expected by the
remove_if, which it only knows after template argument deduction has
succeeded. And to do template argument deduction on remove_if, it has
to know the types of the arguments, which means the type of isspace,
which it will only know once it has been able to resolve the overload on
it.
(I'm actually surprised that your little bit of code compiles: you
obviously include <iostream>, and typically, <iostream> will include
<locale>, which will bring in the function template isspace.)
Of course, the function template isspace must be called with two
arguments, so if it were ever chosen, the instantiation of remove_if
wouldn't compile (but the compiler does not try to instantiate
remove_if until it has chosen a function). And the isspace in
<ctype.h> will result in undefined behavior if passed a char, so you
can't use it. The usual solution is to create a set of predicate
objects for your tool box, and use them. Something like the following
should work if you're only concerned with char:
template <std::ctype<char>::mask m>
class Is : public std::unary_function<char, bool>
{
std::locale myLocale; // To ensure lifetime of following...
std::ctype<char> const* myCType;
public:
Is( std::locale const& loc = std::locale() )
: myLocale( loc )
, myCType( &std::use_facet<std::ctype<char> >( myLocale ) )
{
}
bool operator()( char ch ) const
{
return myCType->is( m, ch );
}
};
typedef Is<std::ctype_base::space> IsSpace;
It's trivial to add the additional typedef's so you get the complete
set, and I've found it useful to add an IsNot template as well. It's
simple, and it avoids all of the surrounding issues.
Every definition I've seen of function ios::setstate( iostate state ) shows that the function takes ONE and ONLY ONE parameter yet when I compile a program with the following function call, everything compiles and runs just fine:
mystream.setstate( std::ios_base::badbit, true );
What exactly is the second parameter and why is there no documentation about it?
EDIT: I'm using the command line compiler of the latest version of Microsoft Visual Studio 2010.
It's required to accept a single argument, as you've noted, but implementations are allowed to extend member functions via parameters with default values (§17.6.5.5). In other words, as long as this works:
mystream.setstate( std::ios_base::badbit );
your compiler is conforming. Nothing says that your code doesn't have to work, though.
(Your library implementation has decided that a boolean parameter would be useful to have. You never notice it because it has a default value, but you can still get into implementation-specific territory and provide the argument yourself. Whether or not this is a good idea is obviously another question, but probably not.)
Pretty simple question I think but I'm having trouble finding any discussion on it at all anywhere on the web. I've seen the triple-dot's as function parameters multiple times throughout the years and I've always just thought it meant "and whatever you would stick here." Until last night, when I decided to try to compile a function with them. To my surprise, it compiled without warnings or errors on MSVC2010. Or at least, it appeared to. I'm not really sure, so I figured I'd ask here.
They are va_args, or variable number of arguments. See for example The C book
Triple dots means the function is variadic (i.e. accepts a variable number of parameters). However to be used there should be at least a parameter... so having just "..." isn't an usable portable declaration.
Sometimes variadic function declarations are used in C++ template trickery just because of the resolution precedence of overloads (i.e. those functions are declared just to make a certain template instantiation to fail or succeed, the variadic function themselves are not implemented). This technique is named Substitution failure is not an error (SFINAE).
It's called ellipses - basically saying that function accepts any number of arguments of any non-class type.
It means that the types of arguments, and the number of them are unspecified. A concrete example with which you are probably familiar would be something like printf(char *, ...)
If you use printf, you can put whatever you like after the format string, and it is not enforced by the compiler.
e.g. printf("%s:%s",8), gets through the compiler just the same as if the "expected" arguments are provided printf("%s:%s", "stringA", "stringB").
Unless really necessary, it should be avoided, as it creates the potential for a run time error to occur, where it might otherwise have been picked up at compile time. If there is a finite, enumerable variation in the arguments your function can accept, then it is better to enumerate them by overloading.