boost::lexical_cast wrong output - c++

Output of boost::lexical_cast with a bool variable as input is expected to be a 0 or 1 value. But I get different value instead.
It's not something which happen normally. Let's take a look at example code I've wrote:
#include <string>
#include <boost/lexical_cast.hpp>
int main()
{
bool alfa = true; // Doesn't metter whether alfa is initialized at definition or not
char beta = 71; // An integer value. Different values don't make any differences.
memcpy(&alfa, &beta, sizeof(alfa));
printf("%s\n", boost::lexical_cast<std::string>(alfa).c_str());
}
From this code, I've got "w" (ASCII code of w is 71) as output! But I expected it to be a 0 or 1 value.
The problem is just the value that bool variable will cast into. The bool variable in the given example is already considered true.
What causes problem is imagine I want to convert back the converted value. That's where it throws exception because for example "w" character can't be converted to bool. But if the output was 0 or 1, re-conversion would be possible.
std::string converted_value = boost::lexical_cast<std::string>(alfa);
bool reconverted_value = boost::lexical_cast<bool>(converted_value ); // In this line, boost::bad_lexical_cast will be thrown
I was wondering whether the output is right or this is a bug in boost::lexical_cast?
Also when I was trying to do the same thing and cast the variable to int, I faced boost::bad_lexical_cast exception.
My boost version: 1.58
Live sample

The C++ standard does not specify how a Boolean is stored in memory, only that there are two possible values: true and false. Now, on your machine, I presume these are stored, respectively, as 1 and 0. The compiler is allowed to make assumptions, and in particular it's allowed to assume that these are going to be the only two values stored in a Boolean.
Thus, when boost::lexical_cast sees a Boolean, it runs code that probably looks something like this (after optimization)
// Gross oversimplification
std::string lexical_cast(bool value) {
char res = '0' + (int)value;
return std::string(1, res);
}
If value is 0 or 1, this works fine and does what you want. However, you put a 71 into it. So we add the ASCII code of '0' (48) to 71 and get 119, the ASCII code of 'w'.
Now, I'm not a C++ standard expert, but I would guess that storing a non-standard value into a Boolean with memcpy is undefined behavior. At the very least, your code is non-portable. Perhaps someone more versed in the standard can fill in the holes in my knowledge as far as that's concerned.

You broke the rules. What did you expect, that C++ would add useless code to your program to catch rule-breakers?
Boolean variables can only be 0 or 1, false or true. And they are, if you assign to them properly, according to the rules.
If you memcpy them or reinterpret_cast then the underlying implementation shows through. A bool is a memory byte. If it somehow gets set to something other than 0 or 1 then there it is.
I'd have to double-check but I'm not even sure you are guaranteed that a bool is one byte. I think you can take a pointer to it. But if the hardware had some way to create a pointer to one bit, you could be quite surprised at what C++ did with that.

Related

Why does bool exist when we can use int?

This may sound as a really dumb question. But it has been bothering me for the past few days. And It's not only concerning the C++ Programming Language as I've added it's tag. My Question is that. In Computer Science Boolean (bool) datatype has only two possible values. 'true' or 'false'. And also, in Computer Science, 1 is true and 0 is false. So why does boolean exists at all? Why not we use an integer that can return only two possible values, Such as 1 or 0.
For example :
bool mindExplosion = true; // true!
int mindExplosion = 1; // true!!
// or we can '#define true 1' and it's the same right?
What am I missing?
Why does bool exist when we can use int?
Well, you don't need something as large as an int to represent two states so it makes sense to allow for a smaller type to save space
Why not we use an integer that can return only two possible values, Such as 1 or 0.
That is exactly what bool is. It is an unsigned integer type that represents true (1) or false (0).
Another nice thing about having a specific type for this is that it express intent without any need for documentation. If we had a function like (warning, very contrived example)
void output(T const & val, bool log)
It is easy to see that log is an option and if we pass false it wont log. If it were instead
void output(T const & val, int log)
Then we aren't sure what it does. Is it asking for a log level? A flag on whether to log or not? Something else?
What am I missing?
Expressiveness.
When a variable is declared int it might be used only for 0 and 1, or it might hold anything from INT_MIN..INT_MAX.
When a variable is declared bool, it is made explicit that it is to hold a true / false value.
Among other things, this allows the compiler to toss warnings when an int is used in places where you really want a bool, or attempt to store a 2 in a bool. The compiler is your friend; give it all the hints possible so it can tell you when your code starts looking funky.

write extern C and C++ both compiler

How I can fix this problem. Look to program en the extern C and C++ code.
The program is good. How you can write a good C and C++ code.
#include <stdio.h>
#include “myccode.h”
void dchar(char c)
{
printf ("%d\n",(int)c);
}
void main()
{
dchar(128);
}
On the screen has to be 128 but, you get -128. They told me write a extern C code. Me and my friend wrote for both C en C++ code compiler, but nothing happened. We still get -128.
char is typically a signed type holding one byte, even though that's up to the compiler (thanks commenters for pointing that out). It can therefore hold numbers from -128 to +127.
Thus, 128 is first converted to a char (causing an overflow, being converted to -128), and then back to int.
To print 128, you have to use a type which can hold this value. This could be for example an unsigned char:
#include <cstdio>
void dchar(unsigned char c)
{
printf ("%u\n",c);
}
int main()
{
dchar(128);
return 0;
}
Output:
128
The range of values a char can support, for your compiler, is -128 to 127. It is implementation defined whether a char is equivalent to a unsigned char or signed char, and your compiler vendor has chosen the latter.
Converting a value of 128 to a signed char that cannot represent the value 128 gives undefined behaviour. So any result is possible from your code.
Your only options would be to change your function so it accepts an integral argument of a type that can represent the value of 128. Options might include unsigned char (the standard guarantees it can represent values between 0 and 255) or int (which is guaranteed able to represent values between -32767 and 32767 - and, depending on compiler, may support a larger range).
The "right" solution depends on the range of values for which you require your function to produce correct output. You have not given any information about what values you need your code to work with other than 128, so that is an answer you will need to work out for yourself.
And, BTW, main() returns int, not void in standard C++.
First I want to thank you for your time and solution. Both program works.
I ask you again to watch the program but, this time with the complete description.
This is the complete program including description.
include
void dchar(char c)
{printf ("%d\n",(int)c);}
dchar the function is simple and press the numeric value of the passed to the function sign off.
What do you think can go wrong in such a simple function, especially when there type security is provided.
You can check this by compiling this short source file separately. Then enter the following source and compile it into position.
extern void dchar (unsigned char); I forgot last time, Sorry
void main()
{dchar(128);}
Everything seems to be normal until you try to run the linked program. Instead of weathered deed value 128
The value -128 is displayed. There has been an incomparable view type conversion so that the numerical value is wrong
was interpreted. The process of mangling function did not even have a chance to prevent this.
To resolve this problem by using well-written header files. Write the external declaration in a .h header files and use #include
To add the header file. Naturally, even now that there are errors do occur, but the probability of this method is dramatically reduced.

Why does this if executes while I have not given any input?

Here is my code snippet:
int a;
if(a=8)
cout<<"Its right";
else
cout<<"Not at all";
getch();
return(0);
I'm getting the output Its right while I've not give any input there its just a assignment to the a=8.
**Borland Compiler** gives the following 2 warnings and executes the code.
1: Possible incorrect assignment (a=8)
2: 'a' is assigned a value that is never used.
When you write:
if(a=8)
You're assigning 8 to the variable a, and returning a, so you're effectively writing:
a = 8;
if(a)
This is non-zero, and hence treated as "true".
I (and the compiler) suspect you intended to write (note == instead of =):
if(a==8)
This is why you get the warnings. (Note that, in this case, a is still uninitialized, so the behavior is undefined. You will still get a warning, and may still return true...)
a = 8 assign 8 to a and returns the value of the assignment, that is 8. 8 is different from 0 and thus is considered true. the test apsses, and the program outputs "Its right".
you may have written if ( a == 8 ) which correctly tests the value of a without changing its value.
if(a=8)
implies assign a value of 8 to a and then use it to evaluate for the if statement. Since 8 is nonzero it essentially translates to TRUE (any value that is non-zero is true).
The first warning is because it's a frequent typo to write = in a condition test where you meant ==. The philosophy behind the warning is that it's a lesser burden for a programmer to rewrite if(a=b) to, say if((a=b)!=0) in the few cases he wants that, than to spend frustrating hours debugging the results if he mistakenly got an assignment instead of a comparison.
The second warning is just what it says: It looks like a mistake to have a local variable that you assign a value to but never use.
Since they are just warnings, you're free to ignore them if you think you know what you're doing. The compiler may even provide a way for you to turn them off.
Everyone else's answer is correct :)
if (a = 8) // Sets a to be 8 (and returns true so the if statement passes)
if (a == 8) // Check to see if a is equal to 8
I just want to add that I always (force of habit!) write if statements this way :
if (8 == a) {
Now, if I miss out one of the =, the compiler doesn't give me a warning, it gives me an error :)
You need a == (double equals). Single = is an assignment; when treated as an expression as you have done here, it evaluates to the value assigned, in this case 8. 0 is treated as False and anything else is True, so (a=8) == 8 == True.

Non-Integer numbers in an String and using atoi

If there are non-number characters in a string and you call atoi [I'm assuming wtoi will do the same]. How will atoi treat the string?
Lets say for an example I have the following strings:
"20234543"
"232B"
"B"
I'm sure that 1 will return the integer 20234543. What I'm curious is if 2 will return "232." [Thats what I need to solve my problem]. Also 3 should not return a value. Are these beliefs false? Also... if 2 does act as I believe, how does it handle the e character at the end of the string? [Thats typically used in exponential notation]
You can test this sort of thing yourself. I copied the code from the Cplusplus reference site. It looks like your intuition about the first two examples are correct, but the third example returns '0'. 'E' and 'e' are treated just like 'B' is in the second example also.
So the rules are
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.
According to the standard, "The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined." (7.20.1, Numeric conversion functions in C99).
So, technically, anything could happen. Even for the first case, since INT_MAX is guaranteed to be at least 32767, and since 20234543 is greater than that, it could fail as well.
For better error checking, use strtol:
const char *s = "232B";
char *eptr;
long value = strtol(s, &eptr, 10); /* 10 is the base */
/* now, value is 232, eptr points to "B" */
s = "20234543";
value = strtol(s, &eptr, 10);
s = "123456789012345";
value = strtol(s, &eptr, 10);
/* If there was no overflow, value will contain 123456789012345,
otherwise, value will contain LONG_MAX and errno will be ERANGE */
If you need to parse numbers with "e" in them (exponential notation), then you should use strtod. Of course, such numbers are floating-point, and strtod returns double. If you want to make an integer out of it, you can do a conversion after checking for the correct range.
atoi reads digits from the buffer until it can't any more. It stops when it encounters any character that isn't a digit, except whitespace (which it skips) or a '+' or a '-' before it has seen any digits (which it uses to select the appropriate sign for the result). It returns 0 if it saw no digits.
So to answer your specific questions: 1 returns 20234543. 2 returns 232. 3 returns 0. The character 'e' is not whitespace, a digit, '+' or '-' so atoi stops and returns if it encounters that character.
See also here.
If atoi encounters a non-number character, it returns the number formed up until that point.
I tried using atoi() in a project, but it wouldn't work if there were any non-digit characters in the mix and they came before the digit characters - it'll return zero. It seems to not mind if they come after the digits, for whatever reason.
Here's a pretty bare bones string to int converter I wrote up that doesn't seem to have that problem (bare bones in that it doesn't work with negative numbers and it doesn't incorporate any error handling, but it might be helpful in specific instances). Hopefully it might be helpful.
int stringToInt(std::string newIntString)
{
unsigned int dataElement = 0;
unsigned int i = 0;
while ( i < newIntString.length())
{
if (newIntString[i]>=48 && newIntString[i]<=57)
{
dataElement += static_cast<unsigned int>(newIntString[i]-'0')*(pow(10,newIntString.length()-(i+1)));
}
i++;
}
return dataElement;
}
I blamed myself up to this atoi-function behaviour when I was learning-approached coding program with function calculating integer factorial result given input parameter by launching command line parameter.
atoi-function returns 0 if value is something else than numeral value and "3asdf" returns 3. C -language handles command line input parameters in char -array pointer variable as we all already know.
I was told that down at the book "Linux Hater's Handbook" there's some discussion appealing for computer geeks doesn't really like atoi-function, it's kind of foolish in reason that there's no way to check validity of given input type.
Some guy asked me why I don't brother to use strtol -function located on stdlib.h -library and he gave me an example attached to my factorial-calculating recursive method but I don't care about factorial result is bigger than integer primary type value -range, out of ranged (too large base number). It will result in negative values in my program.
I solved my problem with atoi-function first checking if given user's input parameter is truly numerical value and if that matches, after then I calculate the factorial value.
Using isdigit() -function located on chtype.h -library is following:
int checkInput(char *str[]) {
for (int x = 0; x < strlen(*str); ++x)
{
if (!isdigit(*str[x])) return 1;
}
return 0;
}
My forum-pal down in other Linux programming forum told me that if I would use strtol I could handle the situations with out of ranged values or even parse signed int to unsigned long -type meaning -0 and other negative values are not accepted.
It's important upper on my code check if charachter is not numerical value. Negotation way to check this one the function returns failed results when first numerical value comes next to check in string. (or char array in C)
Writing simple code and looking to see what it does is magical and illuminating.
On point #3, it won't return "nothing." It can't. It'll return something, but that something won't be useful to you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.

Strange C++ boolean casting behaviour (true!=true)

Just read on an internal university thread:
#include <iostream>
using namespace std;
union zt
{
bool b;
int i;
};
int main()
{
zt w;
bool a,b;
a=1;
b=2;
cerr<<(bool)2<<static_cast<bool>(2)<<endl; //11
cerr<<a<<b<<(a==b)<<endl; //111
w.i=2;
int q=w.b;
cerr<<(bool)q<<q<<w.b<<((bool)((int)w.b))<<w.i<<(w.b==a)<<endl; //122220
cerr<<((w.b==a)?'T':'F')<<endl; //F
}
So a,b and w.b are all declared as bool. a is assigned 1, b is assigned 2, and the internal representation of w.b is changed to 2 (using a union).
This way all of a,b and w.b will be true, but a and w.b won't be equal, so this might mean that the universe is broken (true!=true)
I know this problem is more theoretical than practical (a sake programmer doesn't want to change the internal representation of a bool), but here are the questions:
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Do you know any case where this corner case might become a real issue? (For example while loading binary data from a stream)
EDIT:
Three things:
bool and int have different sizes, that's okay. But what if I use char instead of int. Or when sizeof(bool)==sizeof(int)?
Please give answer to the two questions I asked if possible. I'm actually interested in answers to the second questions too, because in my honest opinion, in embedded systems (which might be 8bit systems) this might be a real problem (or not).
New question: Is this really undefined behavior? If yes, why? If not, why? Aren't there any assumptions on the boolean comparison operators in the specs?
If you read a member of a union that is a different member than the last member which was written then you get undefined behaviour. Writing an int member and then reading the union's bool member could cause anything to happen at any subsequent point in the program.
The only exception is where the unions is a union of structs and all the structs contain a common initial sequence, in which case the common sequence may be read.
Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
Any integer value that is non zero (or pointer that is non NULL) represents true.
But when comparing integers and bool the bool is converted to int before comparison.
Do you know any case where this corner case might become a real issue? (For example while binary loading of data from a stream)
It is always a real issue.
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
If we go by your last code sample both a and b are bool and set to true by assigning 1 and 2 respectfully (Noe the 1 and 2 disappear they are now just true).
So breaking down your expression:
a!=0 // true (a converted to 1 because of auto-type conversion)
b!=0 // true (b converted to 1 because of auto-type conversion)
((a!=0) && (b!=0)) => (true && true) // true ( no conversion done)
a==0 // false (a converted to 1 because of auto-type conversion)
b==0 // false (b converted to 1 because of auto-type conversion)
((a==0) && (b==0)) => (false && false) // false ( no conversion done)
((a!=0) && (b!=0)) || ((a==0) && (b==0)) => (true || false) => true
So I would always expect the above expression to be well defined and always true.
But I am not sure how this applies to your original question. When assigning an integer to a bool the integer is converted to bool (as described several times). The actual representation of true is not defined by the standard and could be any bit pattern that fits in an bool (You may not assume any particular bit pattern).
When comparing the bool to int the bool is converted into an int first then compared.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
Writing data in a binary format is non portable without knowledge.
There are problems with the size of each object.
There are problems with representation:
Integers (have endianess)
Float (Representation undefined ((usually depends on the underlying hardware))
Bool (Binary representation is undefined by the standard)
Struct (Padding between members may differ)
With all these you need to know the underlying hardware and the compiler. Different compilers or different versions of the compiler or even a compiler with different optimization flags may have different behaviors for all the above.
The problem with Union
struct X
{
int a;
bool b;
};
As people mention writing to 'a' and then reading from 'b' is undefined.
Why: because we do not know how 'a' or 'b' is represented on this hardware. Writing to 'a' will fill out the bits in 'a' but how does that reflect on the bits in 'b'. If your system used 1 byte bool and 4 byte int with lowest byte in low memory highest byte in the high memory then writing 1 to 'a' will put 1 in 'b'. But then how does your implementation represent a bool? Is true represented by 1 or 255? What happens if you put a 1 in 'b' and for all other uses of true it is using 255?
So unless you understand both your hardware and your compiler the behavior will be unexpected.
Thus these uses are undefined but not disallowed by the standard. The reason they are allowed is that you may have done the research and found that on your system with this particular compiler you can do some freeky optimization by making these assumptions. But be warned any changes in the assumptions will break your code.
Also when comparing two types the compiler will do some auto-conversions before comparison, remember the two types are converted into the same type before comparison. For comparison between integers and bool the bool is converted into an integer and then compared against the other integer (the conversion converts false to 0 and true to 1). If the objects being converted are both bool then no conversion is required and the comparison is done using boolean logic.
Normally, when assigning an arbitrary value to a bool the compiler will convert it for you:
int x = 5;
bool z = x; // automatic conversion here
The equivalent code generated by the compiler will look more like:
bool z = (x != 0) ? true : false;
However, the compiler will only do this conversion once. It would be unreasonable for it to assume that any nonzero bit pattern in a bool variable is equivalent to true, especially for doing logical operations like and. The resulting assembly code would be unwieldy.
Suffice to say that if you're using union data structures, you know what you're doing and you have the ability to confuse the compiler.
The boolean is one byte, and the integer is four bytes. When you assign 2 to the integer, the fourth byte has a value of 2, but the first byte has a value of 0. If you read the boolean out of the union, it's going to grab the first byte.
Edit: D'oh. As Oleg Zhylin points out, this only applies to a big-endian CPU. Thanks for the correction.
I believe what you're doing is called type punning:
http://en.wikipedia.org/wiki/Type_punning
Hmm strange, I am getting different output from codepad:
11
111
122222
T
The code also seems right to me, maybe it's a compiler bug?
See here
Just to write down my points of view:
Is this okay?
I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.
In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.
Any real-world case
The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).
But this might mean bad programming practice.
One more: in embedded systems this bug might be a bigger problem, than on a "normal" system, because the programmers usually do more "bit-magic" to get the job done.
Addressing the questions posed, I think the behavior is ok and shouldn't be a problem in real world. As we don't have ^^ in C++ I would suggest !bool == !bool as a safe bool comparison technique.
This way every non-zero value in bool variable will be converted to zero and every zero is converted to some non-zero value, but most probably one and the same for any negation operation.