in c++, is it okay to compare an int to a char because of implicit type casting? Or am I misunderstanding the concept?
For example, can I do
int x = 68;
char y;
std::cin >> y;
//Assuming that the user inputs 'Z';
if(x < y)
{
cout << "Your input is larger than x";
}
Or do we need to first convert it to an int?
so
if(x < static_cast<int>(y))
{
cout << "Your input is larger than x";
}
The problem with both versions is that you cannot be sure about the value that results from negative/large values (the values that are negative if char is indeed a signed char). This is implementation defined, because the implementation defines whether char means signed char or unsigned char.
The only way to fix this problem is to cast to the appropriate signed/unsigned char type first:
if(x < (signed char)y)
or
if(x < (unsigned char)y)
Omitting this cast will result in implementation defined behavior.
Personally, I generally prefer use of uint8_t and int8_t when using chars as numbers, precisely because of this issue.
This still assumes that the value of the (un)signed char is within the range of possible int values on your platform. This may not be the case if sizeof(char) == sizeof(int) == 1 (possible only if a char is 16 bit!), and you are comparing signed and unsigned values.
To avoid this problem, ensure that you use either
signed x = ...;
if(x < (signed char)y)
or
unsigned x = ...;
if(x < (unsigned char)y)
Your compiler will hopefully complain with warning about mixed signed comparison if you fail to do so.
Your code will compile and work, for some definition of work.
Still you might get unexpected results, because y is a char, which means its signedness is implementation defined. That combined with unknown size of int will lead to much joy.
Also, please write the char literals you want, don't look at the ASCII table yourself. Any reader (you in 5 minutes) will be thankful.
Last point: Avoid gratuituous cast, they don't make anything better and may hide problems your compiler would normally warn about.
Yes you can compare an int to some char, like you can compare an int to some short, but it might be considered bad style. I would code
if (x < (int)y)
or like you did
if (x < static_cast<int>(y))
which I find a bit too verbose for that case....
BTW, if you intend to use bytes not as char consider also the int8_t type (etc...) from <cstdint>
Don't forget that on some systems, char are signed by default, on others they are unsigned (and you could explicit unsigned char vs signed char).
The code you suggest will compile, but I strongly recommend the static_cast version. Using static_cast you will help the reader understand what do you compare to an integer.
Related
I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).
I would like to find a maximally efficient way to compute a char that contains the least significant bits of an int in C++11. The solution must work with any possible standards-compliant compiler. (I'm using the N3290 C++ draft spec, which is essentially C++11.)
The reason for this is that I'm writing something like a fuzz tester, and want to check libraries that require a std::string as input. So I need to generate random characters for the strings. The pseudo-random generator I'm using provides ints whose low bits are pretty uniformly random, but I'm not sure of the exact range. (Basically the exact range depends on a "size of test case" runtime parameter.)
If I didn't care about working on any compiler, this would be as simple as:
inline char int2char(int i) { return i; }
Before you dismiss this as a trivial question, consider that:
You don't know whether char is a signed or unsigned type.
If char is signed, then a conversion from an unrepresentable int to a char is "implementation-defined" (§4.7/3). This is far better than undefined, but for this solution I'd need to see some evidence that the standard prohibits things like converting all ints not between CHAR_MIN and CHAR_MAX to '\0'.
reinterpret_cast is not permitted between a signed and unsigned char (§5.2.10). static_cast performs the same conversion as in the previous point.
char c = i & 0xff;--though it silences some compiler warnings--is almost certainly not correct for all implementation-defined conversions. In particular, i & 0xff is always a positive number, so in the case that c is signed could quite plausibly not convert negative values of i to negative values of c.
Here are some solutions that do work, but in most of these cases I'm worried they won't be as efficient as a simple conversion. These also seem too complicated for something so simple:
Using reinterpret_cast on a pointer or reference, since you can convert from unsigned char * or unsigned char & to char * or char & (but at the possible cost of runtime overhead).
Using a union of char and unsigned char, where you first assign the int to the unsigned char, then extract the char (which again could be slower).
Shifting left and right to sign-extend the int. E.g., if i is the int, running c = ((i << 8 * (sizeof(i) - sizeof(c)) >> 8 * (sizeof(i) - sizeof(c)) (but that's inelegant, and if the compiler doesn't optimize away the shifts, quite slow).
Here's a minimal working example. The goal is to argue that the assertions can never fail on any compiler, or to define an alternate int2char in which the assertions can never fail.
#include <algorithm>
#include <cassert>
#include <cstdio>
#include <cstdlib>
using namespace std;
constexpr char int2char(int i) { return i; }
int
main(int argc, char **argv)
{
for (int n = 1; n < min(argc, 127); n++) {
char c = -n;
int i = (atoi(argv[n]) << 8) ^ -n;
assert(c == int2char(i));
}
return 0;
}
I've phrased this question in terms of C++ because the standards are easier to find on the web, but I am equally interested in a solution in C. Here's the MWE in C:
#include <assert.h>
#include <stdlib.h>
static char int2char(int i) { return i; }
int
main(int argc, char **argv)
{
for (int n = 1; n < argc && n < 127; n++) {
char c = -n;
int i = (atoi(argv[n]) << 8) ^ -n;
assert(c == int2char(i));
}
return 0;
}
a far better way is to have an array of chars and generate a random number to pick a char from that array. This way you get 'well behaved' characters; or at least characters with well defined badness. If you really want all 256 chars (note 8 bit assumption) then create an array with 256 entries in it ('a','b',....'\t','n'.....)
This will be portable too
Given that you appear to be interested in bit value (rather than numeric value), and have also asked for C solutions, I'm going to post what I believe to be something that's compliant and optimal:
inline char int2char(int i) {
char ret;
memcpy(&ret, (char *)&i + OFFSET, 1);
return ret;
}
where OFFSET is a macro that expands to either 0 or sizeof(int)-1, based on an endianness check.
AFAICS, this works invariant of whether char is signed or unsigned, of what representation is used for negative values, or of the width of char or int. It doesn't rely on any weird type-punning tricks, and has no branching or complex operations (such as divide).
I say "optimal" because I'm assuming that any sane compiler treats memcpy as an intrinsic, and thus will do something smart here.
I need to:
1) Find what is the maximum unsigned int value on my current system. I didn't find it on limits.h. Is it safe to write unsigned int maxUnsInt = 0 - 1;? I also tried unsigned int maxUnsInt = MAX_INT * 2 + 1 that returns the correct value but the compiler shows a warning about int overflow operation.
2) Once found, check if a C++ string (that I know it is composed only by digits) exceeded the maximum unsigned int value on my system.
My final objective is to convert the string to a unsigned int using atoi if and only if it is a valid unsigned int. I would prefer to use only the standard library.
There should be a #define UINT_MAX in <limits.h>; I'd be
very surprised if there wasn't. Otherwise, it's guaranteed
that:
unsigned int u = -1;
will result in the maximum value. In C++, you can also use
std::numeric_limits<unsigned int>::max(), but until C++11,
that wasn't an integral constant expression (which may or may
not be a problem).
unsigned int u = 2 * MAX_INT + 1;
is not guaranteed to be anything (on at least one system,
MAX_INT == UMAX_INT).
With regards to checking a string, the simplest solution would
be to use strtoul, then verify errno and the return value:
bool
isLegalUInt( std::string const& input )
{
char const* end;
errno = 0;
unsigned long v = strtoul( input.c_str(), &end, 10 );
return errno == 0 && *end == '\0' && end != input.c_str() && v <= UINT_MAX;
}
If you're using C++11, you could also use std::stoul, which
throws an std::out_of_range exception in case of overflow.
numeric_limits has limits for various numeric types:
unsigned int maxUnsInt = std::numeric_limits<unsigned int>::max();
stringstream can read a string into any type that supports operator>> and tell you whether it failed:
std::stringstream ss("1234567890123456789012345678901234567890");
unsigned int value;
ss >> value;
bool successful = !ss.fail();
According to this you do not need to calculate it, just use appropriate constant, which it this case should be UINT_MAX.
Few notes.
This seems more of a c way in contrast to c++ but since you say you want to use atol I stick with it. c++ would be using numeric_limits as Joachim suggested. However the c++ standard also defines the c-like macros/definitions, so it should be safe to use.
Also if you want it to be c++-way, it would probably be preferred to use stringstream (which is a part of standard c++ library) for conversion.
Lastly I deliberately don't post explicit code solution, 'cause it looks like homework, and you should be good to go from here now.
I am currently working through Accelerated C++ and have come across an issue in exercise 2-3.
A quick overview of the program - the program basically takes a name, then displays a greeting within a frame of asterisks - i.e. Hello ! surrounded framed by *'s.
The exercise - In the example program, the authors use const int to determine the padding (blank spaces) between the greeting and the asterisks. They then ask the reader, as part of the exercise, to ask the user for input as to how big they want the padding to be.
All this seems easy enough, I go ahead ask the user for two integers (int) and store them and change the program to use these integers, removing the ones used by the author, when compiling though I get the following warning;
Exercise2-3.cpp:46: warning: comparison between signed and unsigned integer expressions
After some research it appears to be because the code attempts to compare one of the above integers (int) to a string::size_type, which is fine. But I was wondering - does this mean I should change one of the integers to unsigned int? Is it important to explicitly state whether my integers are signed or unsigned?
cout << "Please enter the size of the frame between top and bottom you would like ";
int padtopbottom;
cin >> padtopbottom;
cout << "Please enter size of the frame from each side you would like: ";
unsigned int padsides;
cin >> padsides;
string::size_type c = 0; // definition of c in the program
if (r == padtopbottom + 1 && c == padsides + 1) { // where the error occurs
Above are the relevant bits of code, the c is of type string::size_type because we do not know how long the greeting might be - but why do I get this problem now, when the author's code didn't get the problem when using const int? In addition - to anyone who may have completed Accelerated C++ - will this be explained later in the book?
I am on Linux Mint using g++ via Geany, if that helps or makes a difference (as I read that it could when determining what string::size_type is).
It is usually a good idea to declare variables as unsigned or size_t if they will be compared to sizes, to avoid this issue. Whenever possible, use the exact type you will be comparing against (for example, use std::string::size_type when comparing with a std::string's length).
Compilers give warnings about comparing signed and unsigned types because the ranges of signed and unsigned ints are different, and when they are compared to one another, the results can be surprising. If you have to make such a comparison, you should explicitly convert one of the values to a type compatible with the other, perhaps after checking to ensure that the conversion is valid. For example:
unsigned u = GetSomeUnsignedValue();
int i = GetSomeSignedValue();
if (i >= 0)
{
// i is nonnegative, so it is safe to cast to unsigned value
if ((unsigned)i >= u)
iIsGreaterThanOrEqualToU();
else
iIsLessThanU();
}
else
{
iIsNegative();
}
I had the exact same problem yesterday working through problem 2-3 in Accelerated C++. The key is to change all variables you will be comparing (using Boolean operators) to compatible types. In this case, that means string::size_type (or unsigned int, but since this example is using the former, I will just stick with that even though the two are technically compatible).
Notice that in their original code they did exactly this for the c counter (page 30 in Section 2.5 of the book), as you rightly pointed out.
What makes this example more complicated is that the different padding variables (padsides and padtopbottom), as well as all counters, must also be changed to string::size_type.
Getting to your example, the code that you posted would end up looking like this:
cout << "Please enter the size of the frame between top and bottom";
string::size_type padtopbottom;
cin >> padtopbottom;
cout << "Please enter size of the frame from each side you would like: ";
string::size_type padsides;
cin >> padsides;
string::size_type c = 0; // definition of c in the program
if (r == padtopbottom + 1 && c == padsides + 1) { // where the error no longer occurs
Notice that in the previous conditional, you would get the error if you didn't initialize variable r as a string::size_type in the for loop. So you need to initialize the for loop using something like:
for (string::size_type r=0; r!=rows; ++r) //If r and rows are string::size_type, no error!
So, basically, once you introduce a string::size_type variable into the mix, any time you want to perform a boolean operation on that item, all operands must have a compatible type for it to compile without warnings.
The important difference between signed and unsigned ints
is the interpretation of the last bit. The last bit
in signed types represent the sign of the number, meaning:
e.g:
0001 is 1 signed and unsigned
1001 is -1 signed and 9 unsigned
(I avoided the whole complement issue for clarity of explanation!
This is not exactly how ints are represented in memory!)
You can imagine that it makes a difference to know if you compare
with -1 or with +9. In many cases, programmers are just too lazy
to declare counting ints as unsigned (bloating the for loop head f.i.)
It is usually not an issue because with ints you have to count to 2^31
until your sign bit bites you. That's why it is only a warning.
Because we are too lazy to write 'unsigned' instead of 'int'.
At the extreme ranges, an unsigned int can become larger than an int.
Therefore, the compiler generates a warning. If you are sure that this is not a problem, feel free to cast the types to the same type so the warning disappears (use C++ cast so that they are easy to spot).
Alternatively, make the variables the same type to stop the compiler from complaining.
I mean, is it possible to have a negative padding? If so then keep it as an int. Otherwise you should probably use unsigned int and let the stream catch the situations where the user types in a negative number.
The primary issue is that underlying hardware, the CPU, only has instructions to compare two signed values or compare two unsigned values. If you pass the unsigned comparison instruction a signed, negative value, it will treat it as a large positive number. So, -1, the bit pattern with all bits on (twos complement), becomes the maximum unsigned value for the same number of bits.
8-bits: -1 signed is the same bits as 255 unsigned
16-bits: -1 signed is the same bits as 65535 unsigned
etc.
So, if you have the following code:
int fd;
fd = open( .... );
int cnt;
SomeType buf;
cnt = read( fd, &buf, sizeof(buf) );
if( cnt < sizeof(buf) ) {
perror("read error");
}
you will find that if the read(2) call fails due to the file descriptor becoming invalid (or some other error), that cnt will be set to -1. When comparing to sizeof(buf), an unsigned value, the if() statement will be false because 0xffffffff is not less than sizeof() some (reasonable, not concocted to be max size) data structure.
Thus, you have to write the above if, to remove the signed/unsigned warning as:
if( cnt < 0 || (size_t)cnt < sizeof(buf) ) {
perror("read error");
}
This just speaks loudly to the problems.
1. Introduction of size_t and other datatypes was crafted to mostly work,
not engineered, with language changes, to be explicitly robust and
fool proof.
2. Overall, C/C++ data types should just be signed, as Java correctly
implemented.
If you have values so large that you can't find a signed value type that works, you are using too small of a processor or too large of a magnitude of values in your language of choice. If, like with money, every digit counts, there are systems to use in most languages which provide you infinite digits of precision. C/C++ just doesn't do this well, and you have to be very explicit about everything around types as mentioned in many of the other answers here.
or use this header library and write:
// |notEqaul|less|lessEqual|greater|greaterEqual
if(sweet::equal(valueA,valueB))
and don't care about signed/unsigned or different sizes
Given the following piece of (pseudo-C++) code:
float x=100, a=0.1;
unsigned int height = 63, width = 63;
unsigned int hw=31;
for (int row=0; row < height; ++row)
{
for (int col=0; col < width; ++col)
{
float foo = x + col - hw + a * (col - hw);
cout << foo << " ";
}
cout << endl;
}
The values of foo are screwed up for half of the array, in places where (col - hw) is negative. I figured because col is int and comes first, that this part of the expression is converted to int and becomes negative. Unfortunately, apparently it doesn't, I get an overflow of an unsigned value and I've no idea why.
How should I resolve this problem? Use casts for the whole or part of the expression? What type of casts (C-style or static_cast<...>)? Is there any overhead to using casts (I need this to work fast!)?
EDIT: I changed all my unsigned ints to regular ones, but I'm still wondering why I got that overflow in this situation.
Unsigned integers implement unsigned arithmetic. Unsigned arithmetic is modulo arithmetics. All values are adjusted modulo 2^N, where N is the number of bits in the value representation of unsigned type.
In simple words, unsigned arithmetic always produces non-negative values. Every time the expression should result in negative value, the value actually "wraps around" 2^N and becomes positive.
When you mix a signed and an unsigned integer in a [sub-]expression, the unsigned arithmetic "wins", i.e. the calculations are performed in unsigned domain. For example, when you do col - hw, it is interpreted as (unsigned) col - hw. This means that for col == 0 and hs == 31 you will not get -31 as the result. Instead you wil get UINT_MAX - 31 + 1, which is normally a huge positive value.
Having said that, I have to note that in my opinion it is always a good idea to use unsigned types to represent inherently non-negative values. In fact, in practice most (or at least half) of integer variables in C/C++ the program should have unsigned types. Your attempt to use unsigned types in your example is well justified (if understand the intent correctly). Moreover, I'd use unsigned for col and row as well. However, you have to keep in mind the way unsigned arithmetic works (as described above) and write your expressions accordingly. Most of the time, an expression can be rewritten so that it doesn't cross the bounds of unsigned range, i.e. most of the time there's no need to explicitly cast anything to signed type. Otherwise, if you do need to work with negative values eventually, a well-placed cast to signed type should solve the problem.
How about making height, width, and hw signed ints? What are you really gaining by making them unsigned? Mixing signed and unsigned integers is always asking for trouble. At first glance, at least, it doesn't look like you gain anything by using unsigned values here. So, you might as well make them all signed and save yourself the trouble.
You have the conversion rules backwards — when you mix signed and unsigned versions of the same type, the signed operand is converted to unsigned.
If you want this to be fast you should static_cast all unsigned values before you start looping and use their int versions rather than unsigned int. You can still require inputs being unsigned and then just cast them on the way to your algorithm to retain the required domain for your function.
Casts won't happen automatically- uncasted arithmetic still has it's uses. The usual example is int / int = int, even if data is lost by not converting to float. I'd use signed int unless it's impossible to do so because of INT_MAX being too small.