C++ automatic way to make truncations print an error at runtime - c++

I have just joined a team that has thousands of lines of code like:
int x = 0;
x=something();
short y=x;
doSomethingImportantWith(y);
The compiler gives nice warnings saying: Conversion of XX bit type value to "short" causes truncation. I've been told that there are no cases where truncation really happens, but I seriously doubt it.
Is there a nice way to insert checks into each case having the effect of:
if (x>short.max) printNastyError(__FILE,__LINE);
before each assignment? Doing this manually will take more time and effort than I'd like to use, and writing a script that reads the warnings and adds this stuff to the correct file as well as the required includes seems overkill -- especially since I expect someone has already done this (or something like it).
I don't care about performance (really) or anything other than knowing when these problems occur so I can either fix only the ones that really matter, or so I can convince management that it's a problem.

You could try compiling and running it with the following ugly hack:
#include <limits>
#include <cstdlib>
template<class T>
struct IntWrapper
{
T value;
template<class U>
IntWrapper(U u) {
if(u > std::numeric_limits<T>::max())
std::abort();
if(U(-1) < 0 && u < std::numeric_limits<T>::min()) // for signed U only
std::abort();
value = u;
}
operator T&() { return value; }
operator T const&() const { return value; }
};
#define short IntWrapper<short>
int main() {
int i = 1, j = 0x10000;
short ii = i;
short jj = j; // this aborts
}
Obviously, it may break code that passes short as a template argument and likely in other instances, so undefine it where it breaks the build. And you may need to add operator overloads so that the usual arithmetic works with the wrapper.

You can probably write a plugin for gcc to detect those truncation and emit a call to a function that check the conversion is safe. You can write those plugins in C or Python. If you prefer to use clang, it also support writing plugins.
I think the easiest way to do it, would be to have the plugin convert the unsafe cast from int to short to call to a function _convert_int_to_float_fail_if_data_loss(value). I'll leave it as a exercise for the reader how to write such a plugin.

Related

Short and elegant way to do typecasting in C++?

Say I want to store the size of a std::vector in an int I have the following options, to my knowledge:
int size = vector.size(); // Throws an implicit conversion warning
int size = (int)vector.size(); // C like typecasting is discouraged and forbidden in many code standards
int size = static_cast<int>(vector.size()); // This makes me want to gouge my eyes out (it's ugly)
Is there any other option that avoids all of the above issues?
I'm going to frame challenge this question. You shouldn't want a short and elegant solution to this problem.
Casting in any language, including C++, is basically the programmer's equivalent to swearing: you'll do it sometimes because it's easy and effortless, but you shouldn't. It means that somewhere, somehow, your design got screwed up. Maybe you need to pass the size of an array to an old API, but the old API didn't use size_t. Maybe you designed a piece of code to use float's, but in the actual implementation, you treat them like int's.
Regardless, casting is being used to patch over mistakes made elsewhere in the code. You shouldn't want a short and simple solution to resolve that. You should prefer something explicit and tedious, for two reasons:
It signals to other programmers that the cast isn't a mistake: that it's something intentional and necessary
To make you less likely to do it; and to instead focus on making sure your types are what you intended, rather than what the target API is expecting.
So embrace the static_cast, dynamic_cast, const_cast, and reinterpret_cast style of writing your code. Instead of trying to find ways to make the casts easier, find ways to refactor your code so they're less necessary.
If you're prepared to disregard all of that instead, then write something like this:
template<typename T, typename U>
T as(U && u) {
return static_cast<T>(u);
}
int size = as<int>(values.size());
bool poly_type::operator==(base_type const& o) {
if(this == &o)
return true;
if(typeid(*this) == typeid(o)) {
return as<poly_type const&>(o).value == value;
} else {
return false;
}
}
That'll at least reduce the amount of typing you end up using.
I'm going to answer your question just like you've asked. The other answers say why you shouldn't do it. But if you still want to have this, use this function:
#include <assert.h>
#include <limits.h>
inline int toInt(std::size_t value) {
assert(value<=MAX_INT);
return static_cast<int>(value);
}
Usage:
int size = toInt(vector.size());
toInt asserts if the input value is out of range. Feel free to modify it to your needs.
Storing a vector size , which might exceed the maximum value of int, in an int is an ugly operation in the first place. This is reflected in your compiler warning or the fact that you have to write ugly code to suppress the warning.
The presence of the static_cast informs other people reading the code that there is a potential bug here, the program might malfunction in various ways if the vector size does exceed INT_MAX.
Obviously (I hope?) the best solution is to use the right type for the value being stored, auto size = vector.size();.
If you really are determined to use int for whatever reason then I would recommend adding code to handle the case of the vector begin too large (e.g. throw before the int declaration if it is), or add a code comment explaining why that was never possible.
With no comments, the reader can't tell if your cast was just because you wanted to shut the compiler up and didn't care about the potential bug; or if you knew what you were doing.

How can I elide a call if an edge condition is known at compile time?

I have the following situation: there's a huge set of templates like std::vector that will call memmove() to move parts of array. Sometimes they will want to "move" parts of length zero - for example, if the array tail is removed (like std::vector::erase()), they will want to move the remainder of the array which will happen to have length zero and that zero will be known at compile time (I saw the disassembly - the compiler is aware) yet the compiler will still emit a memmove() call.
So basically I could have a wrapper:
inline void callMemmove( void* dest, const void* source, size_t count )
{
if( count > 0 ) {
memmove( dest, source, count );
}
}
but this would introduce an extra runtime check in cases count is not known in compile time that I don't want.
Is it somehow possible to use __assume hint to indicate to the compiler that if it knows for sure that count is zero it should eliminate the memmove()?
The point of the __assume is to tell the compiler to skip portions of code when optimizing. In the link you provided the example is given with the default clause of the switch construct - there the hint tells the compiler that the clause will never be reached even though theoretically it could. You're telling the optimizer, basically, "Hey, I know better, throw this code away".
For default you can't not write it in (unless you cover the whole range in cases, which is sometimes problematic) because it would cause compilation error. So you need the hint to optimize the code you know that is unneeded out.
In your case - the code can be reached, but not always, so the __assume hint won't help you much. You have to check if the count is really 0. Unless you're sure it can never be anything but 0, then just don't write it in.
This solution uses a trick described in C++ compile-time constant detection - the trick uses the fact compile time integer zero can be converted to a pointer, and this can be used together with overloading to check for the "compile time known" property.
struct chkconst {
struct Small {char a;};
struct Big: Small {char b;};
struct Temp { Temp( int x ) {} };
static Small chk2( void* ) { return Small(); }
static Big chk2( Temp ) { return Big(); }
};
#define is_const_0(X) (sizeof(chkconst::chk2(X))<sizeof(chkconst::Big))
#define is_const(X) is_const_0( int(X)-int(X) )
#define memmove_smart(dst,src,n) do { \
if (is_const(n)) {if (n>0) memmove(dst,src,n);} \
else memmove(dst,src,n); \
} while (false)
Or, in your case, as you want to check for zero only anyway, one could use is_const_0 directly for maximum simplicity and portability:
#define memmove_smart(dst,src,n) if (is_const_0(n)) {} else memmove(dst,src,n)
Note: the code here used a version of is_const simpler than in the linked question. This is because Visual Studio is more standard conformant than GCC in this case. If targeting gcc, you could use following is_const variant (adapted to handle all possible integral values, including negative and INT_MAX):
#define is_const_0(X) (sizeof(chkconst::chk2(X))<sizeof(chkconst::Big))
#define is_const_pos(X) is_const_0( int(X)^(int(X)&INT_MAX) )
#define is_const(X) (is_const_pos(X)|is_const_pos(-int(X))|is_const_pos(-(int(X)+1)))
I think that you misunderstood the meaning of __assume. It does not tell the compiler to change its behavior when it knows what the values are, but rather it tells it what the values will be when it cannot infer it by itself.
In your case, if you told it to __assume that count > 0 it will skip the test, as you already told it that the result will always be true, it will remove the condition and will call memmove always, which is exactly what you want to avoid.
I don't know the intrinsics of VS, but in GCC there is a likely/unlikely intrinsic (__builtin_expect((x),1)) that can be used to hint the compiler as to which is the most probable outcome of the test. that will not remove the test, but will layout code so that the most probable (as in by your definition) branch is more efficient (will not branch).
If its possible to rename the memmove, I think something like this
would do - http://codepad.org/s974Fp9k
struct Temp {
int x;
Temp( int y ) { x=y; }
operator int() { return x; };
};
void memmove1( void* dest, const void* source, void* count ) {
printf( "void\n" );
}
void memmove1( void* dest, const void* source, Temp count ) {
memmove( dest, source, count );
printf( "temp\n" );
}
int main( void ) {
int a,b;
memmove1( &a,&b, sizeof(a) );
memmove1( &a,&b, sizeof(a)-4 );
}
I think the same is probably possible without the class - have to look at conversion rules
to confirm it.
Also it should be possible to overload the original memmove(), eg. by passing an
object (like Temp(sizeof(a)) as 3rd argument.
Not sure which way would be more convenient.

Templates: instantiating from (and refering to) non-typed parameter at runtime?

I developed a generic "Unsigned" class, or really a class template Unsigned<size_t N> that models after the C (C++) built-in unsigneds using the amount of uint8_ts as a parameter. For example Unsigned<4> is identical to a uint32_t and Unsigned<32> would be identical to a uint256_t -- if it existed.
So far I have managed to follow most if not all of the semantics expected from a built-in unsigned -- in particular sizeof(Natural<N>)==N, (Natural<N>(-1) == "max_value_all_bits_1" == ~Natural<N>(0)), compatibility with abs(), sign(), div (using a custom div_t structure), ilogb() (exclusive to GCC it seems) and numeric_limits<>.
However I'm facing the issue that, since 1.- a class template is just a template so templated forms are unrelated, and 2.- the template non-typed parameter requires a "compile-time constant", which is way stricter than "a const", I'm essentially unable to create a Unsigned given an unknown N.
In other words, I can't have code like this:
...
( ... assuming all adequate headers are included ...)
using namespace std;
using lpp::Unsigned;
std::string str;
cout<< "Enter an arbitrarily long integer (end it with <ENTER>) :>";
getline(cin, str, '\n');
const int digits10 = log10(str.length()) + 1;
const int digits256 = (digits10 + 1) * ceil(log(10)/log(256)); // from "10×10^D = 256^T"
// at this point, I "should" be able to, semantically, do this:
Unsigned<digits256> num; // <-- THIS I CAN'T -- num would be guaranteed
// big enough to hold str's binary expression,
// no more space is needed
Unsigned::from_str(num, str); // somehow converts (essentially a base change algo)
// now I could do whatever I wanted with num "as if" a builtin.
std::string str_b3 = change_base(num, 3); // a generic implemented somehow
cout<< "The number above, in base 3, is: "<< str_b3<< endl;
...
(A/N -- This is part of the testsuite for Unsigned, which reads a "slightly large number" (I have tried up to 120 digits -- after setting N accordingly) and does things like expressing it in other bases, which in and of itself tests all arithmethic functions already.)
In looking for possible ways to bypass or otherwise alleviate this limitation, I have been running into some concepts that I'd like to try and explore, but I wouldn't like to spend too much effort into an alternative that is only going to make things more complicated or that would make the behaviour of the class(es) deviate too much.
The first thing I thought was that if I wasn't able to pick up a Unsigned<N> of my choice, I could at least pick up from a set of pre-selected values of N which would lead to the adequate constructor being called at runtime, but depending on a compile-time value:
???? GetMeAnUnsigned (size_t S) {
switch (S) {
case 0: { throw something(); } // we can't have a zero-size number, right?
case 1, 2, 3, 4: { return Unsigned<4>(); break; }
case 5, 6, 7, 8: { return Unsigned<8>(); break; }
case 9, 10, 11, 12, 13, 14, 15, 16: { return Unsigned<16>(); break; }
....
default: { return Unsigned<128>(); break; } // wow, a 1Kib number!
} // end switch
exit(1); // this point *shouldn't* be reachable!
} // end function
I personally like the approach. However I don't know what can I use to specify the return type. It doesn't actually "solve" the problem, it only degrades its severity by a certain degree. I'm sure doing the trick with the switch would work since the instantiations are from compile-time constant, it only changes which of them will take place.
The only viable help to declare the return type seems to be this new C++0(1?)X "decltype" construct which would allow me to obtain the adequate type, something like, if I understood the feature correctly:
decltype (Unsigned<N>) GetMeAnUnsigned (size_t S) {
.. do some choices that originate an N
return Unsigned<N>();
}
... or something like that. I haven't entered into C++?X beyond auto (for iterators) yet, so the first question would be: would features like decltype or auto help me to achieve what I want? (Runtime selection of the instantiation, even if limited)
For an alternative, I was thinking that if the problem was the relation between my classes then I could make them all a "kind-of" Base by deriving the template itself:
template <size_t N>
class Unsigned : private UnsignedCommon { ...
... but I left that approach in the backburner because, well, one doesn't do that (make all a "kind-of") with built-ins, plus for the cases where one does actually treat them as a common class it requires initializing statics, returning pointers and leave the client to destruct if I recall correctly. Second question then: did I do wrong in discarding this alternative too early?
In a nutshell, your problem is no different from that of the built-in integral types. Given a short, you can't store large integers in it. And you can't at runtime decide which type of integer to use, unless you use a switch or similar to choose between several predefined options (short, int, long, long long, for example. Or in your case, Unsigned<4>, Unsigned<8>, Unsigned<256>. The size cannot be computed dynamically at runtime, in any way.
You have to either define a dynamically sized type (similar to std::vector), where the size is not a template parameter, so that a single type can store any type of integer (and then accept the loss of efficiency that implies), or accept that the size must be chosen at compile-time, and the only option you have for handling "arbitrary" integers is to hardcode a set of predefined sizes and choose between them at runtime.
decltype won't solve your problem either. It is fairly similar to auto, it works entirely at compile-time, and just returns the type of an expression. (The type of 2+2 is int and the compiler knows this at compiletime, even though the value 4 is only computed at runtime)
The problem you are facing is quite common. Templates are resolved at compile time, while you need to change your behavior at runtime. As much as you might want to do that with the mythical one extra layer of indirection the problem won't go away: you cannot choose the return type of your function.
Since you need to perform the operations based on runtime information you must fall back to using dynamic polymorphism (instead of the static polymorphism that templates provide). That will imply using dynamic allocation inside the GetMeAnUnsigned method and possibly returning a pointer.
There are some tricks that you can play, like hiding the pointer inside a class that offers the public interface and delegates to an internal allocated object, in the same style as boost::any so that the user sees a single type even if the actual object is chosen at runtime. That will make the design harder, I am not sure how much more complex the code will be, but you will need to really think on what is the minimal interface that you must offer in the internal class hierarchy to fulfill the requirements of the external interface --this seems like a really interesting problem to tacke...
You can't directly do that. Each unsigned with a separate number has a separate type, and the compiler needs to know the return type of your method at compile time.
What you need to do is have an Unsigned_base base class, from which the Unsigned<t> items derive. You can then have your GetMeAnUnsigned method return a pointer to Unsigned_base. That could then be casted using something like dynamic_cast<Unsigned<8> >().
You might be better off having your function return a union of the possible unsigned<n> types, but that's only going to work if your type meets the requirements of being a union member.
EDIT: Here's an example:
struct UnsignedBase
{
virtual ~UnsignedBase() {}
};
template<std::size_t c>
class Unsigned : public UnsignedBase
{
//Implementation goes here.
};
std::auto_ptr<UnsignedBase> GiveMeAnUnsigned(std::size_t i)
{
std::auto_ptr<UnsignedBase> result;
switch(i)
{
case 42:
result.reset(new Unsigned<23>());
default:
result.reset(new Unsigned<2>());
};
return result;
}
It's a very common problem indeed, last time I saw it was with matrices (dimensions as template parameters and how to deal with runtime supplied value).
It's unfortunately an intractable problem.
The issue is not specific to C++ per se, it's specific to strong typing coupled with compile-time checking. For example Haskell could exhibit a similar behavior.
There are 2 ways to deal with this:
You use a switch not to create the type but actually to launch the full computation, ie main is almost empty and only serve to read the input value
You use boxing: you put the actual type in a generic container (either by hand-crafted class or boost::any or boost::variant) and then, when necessary, unbox the value for specific treatment.
I personally prefer the second approach.
The easier way to do this is to use a base class (interface):
struct UnsignedBase: boost::noncopyable
{
virtual ~UnsignedBase() {}
virtual UnsignedBase* clone() const = 0;
virtual size_t bytes() const = 0;
virtual void add(UnsignedBase const& rhs) = 0;
virtual void substract(UnsignedBase const& rhs) = 0;
};
Then you wrap this class in a simple manager to ease memory management for clients (you hide the fact that you rely on heap allocation + unique_ptr):
class UnsignedBox
{
public:
explicit UnsignedBox(std::string const& integer);
template <size_t N>
explicit UnsignedBox(Unsigned<N> const& integer);
size_t bytes() const { return mData->bytes(); }
void add(UnsignedBox const& rhs) { mData->add(rhs.mData); }
void substract(UnsignedBox const& rhs) { mData->substract(rhs.mData); }
private:
std::unique_ptr<UnsignedBase> mData;
};
Here, the virtual dispatch takes care of unboxing (somewhat), you can also unbox manually using a dynamic_cast (or static_cast if you know the number of digits):
void func(UnsignedBase* i)
{
if (Unsigned<2>* ptr = dynamic_cast< Unsigned<2> >(i))
{
}
else if (Unsigned<4>* ptr = dynamic_cast< Unsigned<4> >(i))
{
}
// ...
else
{
throw UnableToProceed(i);
}
}

In which case is if(a=b) a good idea? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Inadvertent use of = instead of ==
C++ compilers let you know via warnings that you wrote,
if( a = b ) { //...
And that it might be a mistake that you certainly wanted to write:
if( a == b ) { //...
But is there a case where the warning should be ignored, because it's a good way to use this "feature"?
I don't see any code clarity reason possible, so is there a case where it’s useful?
Two possible reasons:
Assign & Check
The = operator (when not overriden) normally returns the value that it assigned. This is to allow statements such as a=b=c=3. In the context of your question, it also allows you to do something like this:
bool global;//a global variable
//a function
int foo(bool x){
//assign the value of x to global
//if x is equal to true, return 4
if (global=x)
return 4;
//otherwise return 3
return 3;
}
...which is equivalent to but shorter than:
bool global;//a global variable
//a function
int foo(bool x){
//assign the value of x to global
global=x;
//if x is equal to true, return 4
if (global==true)
return 4;
//otherwise return 3
return 3;
}
Also, it should be noted (as stated by Billy ONeal in a comment below) that this can also work when the left-hand argument of the = operator is actually a class with a conversion operator specified for a type which can be coerced (implicitly converted) to a bool. In other words, (a=b) will evaulate to true or false if a is of a type which can be coerced to a boolean value.
So the following is a similar situation to the above, except the left-hand argument to = is an object and not a bool:
#include <iostream>
using namespace std;
class Foo {
public:
operator bool (){ return true; }
Foo(){}
};
int main(){
Foo a;
Foo b;
if (a=b)
cout<<"true";
else
cout<<"false";
}
//output: true
Note: At the time of this writing, the code formatting above is bugged. My code (check the source) actually features proper indenting, shift operators and line spacing. The <'s are supposed to be <'s, and there aren't supposed to be enourmous gaps between each line.
Overridden = operator
Since C++ allows the overriding of operators, sometimes = will be overriden to do something other than what it does with primitive types. In these cases, the performing the = operation on an object could return a boolean (if that's how the = operator was overridden for that object type).
So the following code would perform the = operation on a with b as an argument. Then it would conditionally execute some code depending on the return value of that operation:
if (a=b){
//execute some code
}
Here, a would have to be an object and b would be of the correct type as defined by the overriding of the = operator for objects of a's type. To learn more about operator overriding, see this wikipedia article which includes C++ examples: Wikipedia article on operator overriding
while ( (line = readNextLine()) != EOF) {
processLine();
}
You could use to test if a function returned any error:
if (error_no = some_function(...)) {
// Handle error
}
Assuming that some_function returns the error code in case of an error. Or zero otherwise.
This is a consequence of basic feature of the C language:
The value of an assignment operation is the assigned value itself.
The fact that you can use that "return value" as the condition of an if() statement is incidental.
By the way, this is the same trick that allows this crazy conciseness:
void strcpy(char *s, char *t)
{
while( *s++ = *t++ );
}
Of course, the while exits when the nullchar in t is reached, but at the same time it is copied to the destination s string.
Whether it is a good idea, usually not, as it reduce code readability and is prone to errors.
Although the construct is perfectly legal syntax and your intent may truly be as shown below, don't leave the "!= 0" part out.
if( (a = b) != 0 ) {
...
}
The person looking at the code 6 months, 1 year, 5 years from now, at first glance, is simply going to believe the code contains a "classic bug" written by a junior programmer and will try to "fix" it. The construct above clearly indicates your intent and will be optimized out by the compiler. This would be especially embarrassing if you are that person.
Your other option is to heavily load it with comments. But the above is self-documenting code, which is better.
Lastly, my preference is to do this:
a = b;
if( a != 0 ) {
...
}
This is about a clear as the code can get. If there is a performance hit, it is virtually zero.
A common example where it is useful might be:
do {
...
} while (current = current->next);
I know that with this syntax you can avoid putting an extra line in your code, but I think it takes away some readability from the code.
This syntax is very useful for things like the one suggested by Steven Schlansker, but using it directly as a condition isn't a good idea.
This isn't actually a deliberate feature of C, but a consequence of two other features:
Assignment returns the assigned value
This is useful for performing multiple assignments, like a = b = 0, or loops like while ((n = getchar()) != EOF).
Numbers and pointers have truth values
C originally didn't have a bool type until the 1999 standard, so it used int to represent Boolean values. Backwards compatibility requires C and C++ to allow non-bool expressions in if, while, and for.
So, if a = b has a value and if is lenient about what values it accepts, then if (a = b) works. But I'd recommend using if ((a = b) != 0) instead to discourage anyone from "fixing" it.
You should explicitly write the checking statement in a better coding manner, avoiding the assign & check approach. Example:
if ((fp = fopen("filename.txt", "wt")) != NULL) {
// Do something with fp
}
void some( int b ) {
int a = 0;
if( a = b ) {
// or do something with a
// knowing that is not 0
}
// b remains the same
}
But is there a case where the warning
should be ignored because it's a good
way to use this "feature"? I don't see
any code clarity reason possible so is
there a case where its useful?
The warning can be suppressed by placing an extra parentheses around the assignment. That sort of clarifies the programmer's intent. Common cases I've seen that would match the (a = b) case directly would be something like:
if ( (a = expression_with_zero_for_failure) )
{
// do something with 'a' to avoid having to reevaluate
// 'expression_with_zero_for_failure' (might be a function call, e.g.)
}
else if ( (a = expression2_with_zero_for_failure) )
{
// do something with 'a' to avoid having to reevaluate
// 'expression2_with_zero_for_failure'
}
// etc.
As to whether writing this kind of code is useful enough to justify the common mistakes that beginners (and sometimes even professionals in their worst moments) encounter when using C++, it's difficult to say. It's a legacy inherited from C and Stroustrup and others contributing to the design of C++ might have gone a completely different, safer route had they not tried to make C++ backwards compatible with C as much as possible.
Personally I think it's not worth it. I work in a team and I've encountered this bug several times before. I would have been in favor of disallowing it (requiring parentheses or some other explicit syntax at least or else it's considered a build error) in exchange for lifting the burden of ever encountering these bugs.
while( (l = getline()) != EOF){
printf("%s\n", l);
}
This is of course the simplest example, and there are lots of times when this is useful. The primary thing to remember is that (a = true) returns true, just as (a = false) returns false.
Preamble
Note that this answer is about C++ (I started writing this answer before the tag "C" was added).
Still, after reading Jens Gustedt's comment, I realized it was not the first time I wrote this kind of answer. Truth is, this question is a duplicate of another, to which I gave the following answer:
Inadvertent use of = instead of ==
So, I'll shamelessly quote myself here to add an important information: if is not about comparison. It's about evaluation.
This difference is very important, because it means anything can be inside the parentheses of a if as long as it can be evaluated to a Boolean. And this is a good thing.
Now, limiting the language by forbidding =, where all other operators are authorized, is a dangerous exception for the language, an exception whose use would be far from certain, and whose drawbacks would be numerous indeed.
For those who are uneasy with the = typo, then there are solutions (see Alternatives below...).
About the valid uses of if(i = 0) [Quoted from myself]
The problem is that you're taking the problem upside down. The "if" notation is not about comparing two values like in some other languages.
The C/C++ if instruction waits for any expression that will evaluate to either a Boolean, or a null/non-null value. This expression can include two values comparison, and/or can be much more complex.
For example, you can have:
if(i >> 3)
{
std::cout << "i is less than 8" << std::endl
}
Which proves that, in C/C++, the if expression is not limited to == and =. Anything will do, as long as it can be evaluated as true or false (C++), or zero non-zero (C/C++).
About valid uses
Back to the non-quoted answer.
The following notation:
if(MyObject * p = findMyObject())
{
// uses p
}
enables the user to declare and then use p inside the if. It is a syntactic sugar... But an interesting one. For example, imagine the case of an XML DOM-like object whose type is unknown well until runtime, and you need to use RTTI:
void foo(Node * p_p)
{
if(BodyNode * p = dynamic_cast<BodyNode *>(p_p))
{
// this is a <body> node
}
else if(SpanNode * p = dynamic_cast<SpanNode *>(p_p))
{
// this is a <span> node
}
else if(DivNode * p = dynamic_cast<DivNode *>(p_p))
{
// this is a <div> node
}
// etc.
}
RTTI should not be abused, of course, but this is but one example of this syntactic sugar.
Another use would be to use what is called C++ variable injection. In Java, there is this cool keyword:
synchronized(p)
{
// Now, the Java code is synchronized using p as a mutex
}
In C++, you can do it, too. I don't have the exact code in mind (nor the exact Dr. Dobb's Journal's article where I discovered it), but this simple define should be enough for demonstration purposes:
#define synchronized(lock) \
if (auto_lock lock_##__LINE__(lock))
synchronized(p)
{
// Now, the C++ code is synchronized using p as a mutex
}
(Note that this macro is quite primitive, and should not be used as is in production code. The real macro uses a if and a for. See sources below for a more correct implementation).
This is the same way, mixing injection with if and for declaration, you can declare a primitive foreach macro (if you want an industrial-strength foreach, use Boost's).
About your typo problem
Your problem is a typo, and there are multiple ways to limit its frequency in your code. The most important one is to make sure the left-hand-side operand is constant.
For example, this code won't compile for multiple reasons:
if( NULL = b ) // won't compile because it is illegal
// to assign a value to r-values.
Or even better:
const T a ;
// etc.
if( a = b ) // Won't compile because it is illegal
// to modify a constant object
This is why in my code, const is one of the most used keyword you'll find. Unless I really want to modify a variable, it is declared const and thus, the compiler protects me from most errors, including the typo error that motivated you to write this question.
But is there a case where the warning should be ignored because it's a good way to use this "feature"? I don't see any code clarity reason possible so is there a case where its useful?
Conclusion
As shown in the examples above, there are multiple valid uses for the feature you used in your question.
My own code is a magnitude cleaner and clearer since I use the code injection enabled by this feature:
void foo()
{
// some code
LOCK(mutex)
{
// some code protected by a mutex
}
FOREACH(char c, MyVectorOfChar)
{
// using 'c'
}
}
... which makes the rare times I was confronted to this typo a negligible price to pay (and I can't remember the last time I wrote this type without being caught by the compiler).
Interesting sources
I finally found the articles I've had read on variable injection. Here we go!!!
FOR_EACH and LOCK (2003-11-01)
Exception Safety Analysis (2003-12-01)
Concurrent Access Control & C++ (2004-01-01)
Alternatives
If one fears being victim of the =/== typo, then perhaps using a macro could help:
#define EQUALS ==
#define ARE_EQUALS(lhs,rhs) (lhs == rhs)
int main(int argc, char* argv[])
{
int a = 25 ;
double b = 25 ;
if(a EQUALS b)
std::cout << "equals" << std::endl ;
else
std::cout << "NOT equals" << std::endl ;
if(ARE_EQUALS(a, b))
std::cout << "equals" << std::endl ;
else
std::cout << "NOT equals" << std::endl ;
return 0 ;
}
This way, one can protect oneself from the typo error, without needing a language limitation (that would cripple language), for a bug that happens rarely (i.e., almost never, as far as I remember it in my code).
There's an aspect of this that hasn't been mentioned: C doesn't prevent you from doing anything it doesn't have to. It doesn't prevent you from doing it because C's job is to give you enough rope to hang yourself by. To not think that it's smarter than you. And it's good at it.
Never!
The exceptions cited don't generate the compiler warning. In cases where the compiler generates the warning, it is never a good idea.
RegEx sample
RegEx r;
if(((r = new RegEx("\w*)).IsMatch()) {
// ... do something here
}
else if((r = new RegEx("\d*")).IsMatch()) {
// ... do something here
}
Assign a value test
int i = 0;
if((i = 1) == 1) {
// 1 is equal to i that was assigned to a int value 1
}
else {
// ?
}
My favourite is:
if (CComQIPtr<DerivedClassA> a = BaseClassPtr)
{
...
}
else if (CComQIPtr<DerivedClassB> b = BaseClassPtr)
{
...
}

Very poor boost::lexical_cast performance

Windows XP SP3. Core 2 Duo 2.0 GHz.
I'm finding the boost::lexical_cast performance to be extremely slow. Wanted to find out ways to speed up the code. Using /O2 optimizations on visual c++ 2008 and comparing with java 1.6 and python 2.6.2 I see the following results.
Integer casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
The times I'm seeing are
c++: 6700 milliseconds
java: 1178 milliseconds
python: 6702 milliseconds
c++ is as slow as python and 6 times slower than java.
Double casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
The times I'm seeing are
c++: 56129 milliseconds
java: 2852 milliseconds
python: 30780 milliseconds
So for doubles c++ is actually half the speed of python and 20 times slower than the java solution!!. Any ideas on improving the boost::lexical_cast performance? Does this stem from the poor stringstream implementation or can we expect a general 10x decrease in performance from using the boost libraries.
Edit 2012-04-11
rve quite rightly commented about lexical_cast's performance, providing a link:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:
the following answer is still valid (if only for learning purposes)
there was probably an optimization introduced somewhere between the two versions (I'll search that)
which means that boost is still getting better and better
Original answer
Just to add info on Barry's and Motti's excellent answers:
Some background
Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If lexical_cast was so wrong, someone would have hacked the library either with criticism or with code.
I guess you missed the point of lexical_cast's real value...
Comparing apples and oranges.
In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.
In Python, you are more or less doing the same.
As said by other posts, you are, in essence, using the Java and Python equivalents of sprintf (or the less standard itoa).
In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps sprintf would be better suited), but powerful in the sense of extensibility.
Comparing apples.
If you want to compare a Java Integer.toString method, then you should compare it with either C sprintf or C++ ostream facilities.
The C++ stream solution would be 6 times faster (on my g++) than lexical_cast, and quite less extensible:
inline void toString(const int value, std::string & output)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", value) ;
output = buffer ;
}
The C sprintf solution would be 8 times faster (on my g++) than lexical_cast but a lot less safe:
inline void toString(const int value, char * output)
{
sprintf(output, "%i", value) ;
}
Both solutions are either as fast or faster than your Java solution (according to your data).
Comparing oranges.
If you want to compare a C++ lexical_cast, then you should compare it with this Java pseudo code:
Source s ;
Target t = Target.fromString(Source(s).toString()) ;
Source and Target being of whatever type you want, including built-in types like boolean or int, which is possible in C++ because of templates.
Extensibility? Is that a dirty word?
No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.
In the current case, in a naive viewpoint, lexical_cast will use the stream facilities to convert from a type A into a string stream, and then from this string stream into a type B.
This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use lexical_cast on it, without touching any single line of code.
So, what are the uses of lexical_cast?
The main uses of lexical casting are:
Ease of use (hey, a C++ cast that works for everything being a value!)
Combining it with template heavy code, where your types are parametrized, and as such you don't want to deal with specifics, and you don't want to know the types.
Still potentially relatively efficient, if you have basic template knowledge, as I will demonstrate below
The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.
This is the real point you missed, and this is the point that costs in performance terms.
But it's so slooooooowwww!
If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the lexical_cast ease-of-use feature.
It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:
#ifdef SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
namespace boost
{
template<>
std::string lexical_cast<std::string, int>(const int &arg)
{
// The largest 32-bit integer is 4294967295, that is 10 chars
// On the safe side, add 1 for sign, and 1 for trailing zero
char buffer[12] ;
sprintf(buffer, "%i", arg) ;
return buffer ;
}
}
#endif
By enabling this specialization of lexical_cast for strings and ints (by defining the macro SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.
And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the std::string internal buffer, we could avoid a temporary external buffer, for example).
You could specialize lexical_cast for int and double types. Use strtod and strtol in your's specializations.
namespace boost {
template<>
inline int lexical_cast(const std::string& arg)
{
char* stop;
int res = strtol( arg.c_str(), &stop, 10 );
if ( *stop != 0 ) throw_exception(bad_lexical_cast(typeid(int), typeid(std::string)));
return res;
}
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
int main(int argc, char* argv[])
{
std::string str = "22"; // SOME STRING
int int_str = boost::lexical_cast<int>( str );
std::string str2 = boost::lexical_cast<std::string>( str_int );
return 0;
}
This variant will be faster than using default implementation, because in default implementation there is construction of heavy stream objects. And it is should be little faster than printf, because printf should parse format string.
lexical_cast is more general than the specific code you're using in Java and Python. It's not surprising that a general approach that works in many scenarios (lexical cast is little more than streaming out then back in to and from a temporary stream) ends up being slower than specific routines.
(BTW, you may get better performance out of Java using the static version, Integer.toString(int). [1])
Finally, string parsing and deparsing is usually not that performance-sensitive, unless one is writing a compiler, in which case lexical_cast is probably too general-purpose, and integers etc. will be calculated as each digit is scanned.
[1] Commenter "stepancheg" doubted my hint that the static version may give better performance. Here's the source I used:
public class Test
{
static int instanceCall(int i)
{
String s = new Integer(i).toString();
return s == null ? 0 : 1;
}
static int staticCall(int i)
{
String s = Integer.toString(i);
return s == null ? 0 : 1;
}
public static void main(String[] args)
{
// count used to avoid dead code elimination
int count = 0;
// *** instance
// Warmup calls
for (int i = 0; i < 100; ++i)
count += instanceCall(i);
long start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += instanceCall(i);
long finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
// *** static
// Warmup calls
for (int i = 0; i < 100; ++i)
count += staticCall(i);
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i)
count += staticCall(i);
finish = System.currentTimeMillis();
System.out.printf("10MM Time taken: %d ms\n", finish - start);
if (count == 42)
System.out.println("bad result"); // prevent elimination of count
}
}
The runtimes, using JDK 1.6.0-14, server VM:
10MM Time taken: 688 ms
10MM Time taken: 547 ms
And in client VM:
10MM Time taken: 687 ms
10MM Time taken: 610 ms
Even though theoretically, escape analysis may permit allocation on the stack, and inlining may introduce all code (including copying) into the local method, permitting elimination of redundant copying, such analysis may take quite a lot of time and result in quite a bit of code space, which has other costs in code cache that don't justify themselves in real code, as opposed to microbenchmarks like seen here.
What lexical cast is doing in your code can be simplified to this:
string Cast( int i ) {
ostringstream os;
os << i;
return os.str();
}
There is unfortunately a lot going on every time you call Cast():
a string stream is created possibly allocating memory
operator << for integer i is called
the result is stored in the stream, possibly allocating memory
a string copy is taken from the stream
a copy of the string is (possibly) created to be returned.
memory is deallocated
Thn in your own code:
s = Cast( i );
the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:
string s = Cast( i );
instead.
However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.
To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.
Unfortunately I don't have enough rep yet to comment...
lexical_cast is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary). lexical_cast is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and because lexical_cast must check for and convert iostream error signals. Thus:
a stream object has to be created and destroyed
in the string output case above, note that C++ compilers have a hard time avoiding buffer copies (an alternative is to format directly to the output buffer, like sprintf does, though sprintf won't safely handle buffer overruns)
lexical_cast has to check for stringstream errors (ss.fail()) in order to throw exceptions on conversion failures
lexical_cast is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).
Edit: I just found a message mentioning the use of BOOST_LEXICAL_CAST_ASSUME_C_LOCALE to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked article with a bit more detail.
lexical_cast may or may not be as slow in relation to Java and Python as your bencharks indicate because your benchmark measurements may have a subtle problem. Any workspace allocations/deallocations done by lexical cast or the iostream methods it uses are measured by your benchmarks because C++ doesn't defer these operations. However, in the case of Java and Python, the associated deallocations may in fact have simply been deferred to a future garbage collection cycle and missed by the benchmark measurements. (Unless a GC cycle by chance occurs while the benchmark is in progress and in that case you'd be measuring too much). So it's hard to know for sure without examining specifics of the Java and Python implementations how much "cost" should be attributed to the deferred GC burden that may (or may not) be eventually imposed.
This kind of issue obviously may apply to many other C++ vs garbage collected language benchmarks.
As Barry said, lexical_cast is very general, you should use a more specific alternative, for example check out itoa (int->string) and atoi (string -> int).
if speed is a concern, or you are just interested in how fast such casts can be with C++, there's an interested thread regarding it.
Boost.Spirit 2.1(which is to be released with Boost 1.40) seems to be very fast, even faster than the C equivalents(strtol(), atoi() etc. ).
I use this very fast solution for POD types...
namespace DATATYPES {
typedef std::string TString;
typedef char* TCString;
typedef double TDouble;
typedef long THuge;
typedef unsigned long TUHuge;
};
namespace boost {
template<typename TYPE>
inline const DATATYPES::TString lexical_castNumericToString(
const TYPE& arg,
const DATATYPES::TCString fmt) {
enum { MAX_SIZE = ( std::numeric_limits<TYPE>::digits10 + 1 ) // sign
+ 1 }; // null
char buffer[MAX_SIZE] = { 0 };
if (sprintf(buffer, fmt, arg) < 0) {
throw_exception(bad_lexical_cast(typeid(TYPE),
typeid(DATATYPES::TString)));
}
return ( DATATYPES::TString(buffer) );
}
template<typename TYPE>
inline const TYPE lexical_castStringToNumeric(const DATATYPES::TString& arg) {
DATATYPES::TCString end = 0;
DATATYPES::TDouble result = std::strtod(arg.c_str(), &end);
if (not end or *end not_eq 0) {
throw_exception(bad_lexical_cast(typeid(DATATYPES::TString),
typeid(TYPE)));
}
return TYPE(result);
}
template<>
inline DATATYPES::THuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::THuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::THuge& arg) {
return (lexical_castNumericToString<DATATYPES::THuge>(arg,"%li"));
}
template<>
inline DATATYPES::TUHuge lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TUHuge>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TUHuge& arg) {
return (lexical_castNumericToString<DATATYPES::TUHuge>(arg,"%lu"));
}
template<>
inline DATATYPES::TDouble lexical_cast(const DATATYPES::TString& arg) {
return (lexical_castStringToNumeric<DATATYPES::TDouble>(arg));
}
template<>
inline DATATYPES::TString lexical_cast(const DATATYPES::TDouble& arg) {
return (lexical_castNumericToString<DATATYPES::TDouble>(arg,"%f"));
}
} // end namespace boost