I would like to find a maximally efficient way to compute a char that contains the least significant bits of an int in C++11. The solution must work with any possible standards-compliant compiler. (I'm using the N3290 C++ draft spec, which is essentially C++11.)
The reason for this is that I'm writing something like a fuzz tester, and want to check libraries that require a std::string as input. So I need to generate random characters for the strings. The pseudo-random generator I'm using provides ints whose low bits are pretty uniformly random, but I'm not sure of the exact range. (Basically the exact range depends on a "size of test case" runtime parameter.)
If I didn't care about working on any compiler, this would be as simple as:
inline char int2char(int i) { return i; }
Before you dismiss this as a trivial question, consider that:
You don't know whether char is a signed or unsigned type.
If char is signed, then a conversion from an unrepresentable int to a char is "implementation-defined" (§4.7/3). This is far better than undefined, but for this solution I'd need to see some evidence that the standard prohibits things like converting all ints not between CHAR_MIN and CHAR_MAX to '\0'.
reinterpret_cast is not permitted between a signed and unsigned char (§5.2.10). static_cast performs the same conversion as in the previous point.
char c = i & 0xff;--though it silences some compiler warnings--is almost certainly not correct for all implementation-defined conversions. In particular, i & 0xff is always a positive number, so in the case that c is signed could quite plausibly not convert negative values of i to negative values of c.
Here are some solutions that do work, but in most of these cases I'm worried they won't be as efficient as a simple conversion. These also seem too complicated for something so simple:
Using reinterpret_cast on a pointer or reference, since you can convert from unsigned char * or unsigned char & to char * or char & (but at the possible cost of runtime overhead).
Using a union of char and unsigned char, where you first assign the int to the unsigned char, then extract the char (which again could be slower).
Shifting left and right to sign-extend the int. E.g., if i is the int, running c = ((i << 8 * (sizeof(i) - sizeof(c)) >> 8 * (sizeof(i) - sizeof(c)) (but that's inelegant, and if the compiler doesn't optimize away the shifts, quite slow).
Here's a minimal working example. The goal is to argue that the assertions can never fail on any compiler, or to define an alternate int2char in which the assertions can never fail.
#include <algorithm>
#include <cassert>
#include <cstdio>
#include <cstdlib>
using namespace std;
constexpr char int2char(int i) { return i; }
int
main(int argc, char **argv)
{
for (int n = 1; n < min(argc, 127); n++) {
char c = -n;
int i = (atoi(argv[n]) << 8) ^ -n;
assert(c == int2char(i));
}
return 0;
}
I've phrased this question in terms of C++ because the standards are easier to find on the web, but I am equally interested in a solution in C. Here's the MWE in C:
#include <assert.h>
#include <stdlib.h>
static char int2char(int i) { return i; }
int
main(int argc, char **argv)
{
for (int n = 1; n < argc && n < 127; n++) {
char c = -n;
int i = (atoi(argv[n]) << 8) ^ -n;
assert(c == int2char(i));
}
return 0;
}
a far better way is to have an array of chars and generate a random number to pick a char from that array. This way you get 'well behaved' characters; or at least characters with well defined badness. If you really want all 256 chars (note 8 bit assumption) then create an array with 256 entries in it ('a','b',....'\t','n'.....)
This will be portable too
Given that you appear to be interested in bit value (rather than numeric value), and have also asked for C solutions, I'm going to post what I believe to be something that's compliant and optimal:
inline char int2char(int i) {
char ret;
memcpy(&ret, (char *)&i + OFFSET, 1);
return ret;
}
where OFFSET is a macro that expands to either 0 or sizeof(int)-1, based on an endianness check.
AFAICS, this works invariant of whether char is signed or unsigned, of what representation is used for negative values, or of the width of char or int. It doesn't rely on any weird type-punning tricks, and has no branching or complex operations (such as divide).
I say "optimal" because I'm assuming that any sane compiler treats memcpy as an intrinsic, and thus will do something smart here.
Related
I want to write a function
int char_to_int(char c);
that converts given char to int by zero extending the value. So if the char has N bits and int has M bits, M >= N, then the M-N most significant bits of the int value should be zero and the N least significant bits of the int value should match the bits of the char value.
This seems like a simple task, but I'm not sure how to write it relying only on standard behavior. No UB, no implementation-defined behavior. Without relying on char being 8 bit, int being 32 bit, char being unsigned and any other common assumptions I make that are not guaranteed by standard.
The reason I want to know this, is that I have done this conversion several times in the past, but recently I became aware about the limited guarantees C++ gives about it's data types. So now I'm curious what is the correct, standard compliant approach.
I don't suppose
return (int) c;
is good enough, is it?
There's no hurt in being extra clear:
return int((unsigned char)c);
That way you tell the compiler exactly what you want: the int that contains the char value, read as unsigned. So char 255 will become int 255.
I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).
in c++, is it okay to compare an int to a char because of implicit type casting? Or am I misunderstanding the concept?
For example, can I do
int x = 68;
char y;
std::cin >> y;
//Assuming that the user inputs 'Z';
if(x < y)
{
cout << "Your input is larger than x";
}
Or do we need to first convert it to an int?
so
if(x < static_cast<int>(y))
{
cout << "Your input is larger than x";
}
The problem with both versions is that you cannot be sure about the value that results from negative/large values (the values that are negative if char is indeed a signed char). This is implementation defined, because the implementation defines whether char means signed char or unsigned char.
The only way to fix this problem is to cast to the appropriate signed/unsigned char type first:
if(x < (signed char)y)
or
if(x < (unsigned char)y)
Omitting this cast will result in implementation defined behavior.
Personally, I generally prefer use of uint8_t and int8_t when using chars as numbers, precisely because of this issue.
This still assumes that the value of the (un)signed char is within the range of possible int values on your platform. This may not be the case if sizeof(char) == sizeof(int) == 1 (possible only if a char is 16 bit!), and you are comparing signed and unsigned values.
To avoid this problem, ensure that you use either
signed x = ...;
if(x < (signed char)y)
or
unsigned x = ...;
if(x < (unsigned char)y)
Your compiler will hopefully complain with warning about mixed signed comparison if you fail to do so.
Your code will compile and work, for some definition of work.
Still you might get unexpected results, because y is a char, which means its signedness is implementation defined. That combined with unknown size of int will lead to much joy.
Also, please write the char literals you want, don't look at the ASCII table yourself. Any reader (you in 5 minutes) will be thankful.
Last point: Avoid gratuituous cast, they don't make anything better and may hide problems your compiler would normally warn about.
Yes you can compare an int to some char, like you can compare an int to some short, but it might be considered bad style. I would code
if (x < (int)y)
or like you did
if (x < static_cast<int>(y))
which I find a bit too verbose for that case....
BTW, if you intend to use bytes not as char consider also the int8_t type (etc...) from <cstdint>
Don't forget that on some systems, char are signed by default, on others they are unsigned (and you could explicit unsigned char vs signed char).
The code you suggest will compile, but I strongly recommend the static_cast version. Using static_cast you will help the reader understand what do you compare to an integer.
I need to:
1) Find what is the maximum unsigned int value on my current system. I didn't find it on limits.h. Is it safe to write unsigned int maxUnsInt = 0 - 1;? I also tried unsigned int maxUnsInt = MAX_INT * 2 + 1 that returns the correct value but the compiler shows a warning about int overflow operation.
2) Once found, check if a C++ string (that I know it is composed only by digits) exceeded the maximum unsigned int value on my system.
My final objective is to convert the string to a unsigned int using atoi if and only if it is a valid unsigned int. I would prefer to use only the standard library.
There should be a #define UINT_MAX in <limits.h>; I'd be
very surprised if there wasn't. Otherwise, it's guaranteed
that:
unsigned int u = -1;
will result in the maximum value. In C++, you can also use
std::numeric_limits<unsigned int>::max(), but until C++11,
that wasn't an integral constant expression (which may or may
not be a problem).
unsigned int u = 2 * MAX_INT + 1;
is not guaranteed to be anything (on at least one system,
MAX_INT == UMAX_INT).
With regards to checking a string, the simplest solution would
be to use strtoul, then verify errno and the return value:
bool
isLegalUInt( std::string const& input )
{
char const* end;
errno = 0;
unsigned long v = strtoul( input.c_str(), &end, 10 );
return errno == 0 && *end == '\0' && end != input.c_str() && v <= UINT_MAX;
}
If you're using C++11, you could also use std::stoul, which
throws an std::out_of_range exception in case of overflow.
numeric_limits has limits for various numeric types:
unsigned int maxUnsInt = std::numeric_limits<unsigned int>::max();
stringstream can read a string into any type that supports operator>> and tell you whether it failed:
std::stringstream ss("1234567890123456789012345678901234567890");
unsigned int value;
ss >> value;
bool successful = !ss.fail();
According to this you do not need to calculate it, just use appropriate constant, which it this case should be UINT_MAX.
Few notes.
This seems more of a c way in contrast to c++ but since you say you want to use atol I stick with it. c++ would be using numeric_limits as Joachim suggested. However the c++ standard also defines the c-like macros/definitions, so it should be safe to use.
Also if you want it to be c++-way, it would probably be preferred to use stringstream (which is a part of standard c++ library) for conversion.
Lastly I deliberately don't post explicit code solution, 'cause it looks like homework, and you should be good to go from here now.
I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:
int hex_str_to_int(const string hexStr)
{
stringstream strm;
strm << hex << hexStr;
unsigned int val = 0;
strm >> val;
return static_cast<int>(val);
}
doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.
Here's what I do know:
I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
sizeof(int) will be at least 4 on every architecture my code will run on.
Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.
This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?
Update:
Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?
Quoting the C++03 standard, §4.7/3 (Integral Conversions):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.
While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:
int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;
You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:
if (val < 0x80000000) // positive values need no conversion
return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
return -0x80000000; // aka INT_MIN
else
return -(int)(~val + 1);
This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).
Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.
Here's another solution that worked for me:
if (val <= INT_MAX) {
return static_cast<int>(val);
}
else {
int ret = static_cast<int>(val & ~INT_MIN);
return ret | INT_MIN;
}
If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.
C++20 will have std::bit_cast that copies bits verbatim:
#include <bit>
#include <cassert>
#include <iostream>
int main()
{
int i = -42;
auto u = std::bit_cast<unsigned>(i);
// Prints 4294967254 on two's compliment platforms where int is 32 bits
std::cout << u << "\n";
auto roundtripped = std::bit_cast<int>(u);
assert(roundtripped == i);
std::cout << roundtripped << "\n"; // Prints -42
return 0;
}
cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).
While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.
unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1
Сontrariwise:
int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones