Testing for a maximum unsigned value - c++

Is this the correct way to test for a maximum unsigned value in C and C++ code:
if(foo == -1)
{
// at max possible value
}
where foo is an unsigned int, an unsigned short, and so on.

For C++, I believe you should preferably use the numeric_limits template from the <limits> header :
if (foo == std::numeric_limits<unsigned int>::max())
/* ... */
For C, others have already pointed out the <limits.h> header and UINT_MAX.
Apparently, "solutions which are allowed to name the type are easy", so you can have :
template<class T>
inline bool is_max_value(const T t)
{
return t == std::numeric_limits<T>::max();
}
[...]
if (is_max_value(foo))
/* ... */

I suppose that you ask this question since at a certain point you don't know the concrete type of your variable foo, otherwise you naturally would use UINT_MAX etc.
For C your approach is the right one only for types with a conversion rank of int or higher. This is because before being compared an unsigned short value, e.g, is first converted to int, if all values fit, or to unsigned int otherwise. So then your value foo would be compared either to -1 or to UINT_MAX, not what you expect.
I don't see an easy way of implementing the test that you want in C, since basically using foo in any type of expression would promote it to int.
With gcc's typeof extension this is easily possible. You'd just have to do something like
if (foo == (typeof(foo))-1)

As already noted, you should probably use if (foo == std::numeric_limits<unsigned int>::max()) to get the value.
However for completeness, in C++ -1 is "probably" guaranteed to be the max unsigned value when converted to unsigned (this wouldn't be the case if there were unused bit patterns at the upper end of the unsigned value range).
See 4.7/2:
If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2^n where n
is the number of bits used to
represent the unsigned type). [Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). ]
Note that specifically for the unsigned int case, due to the rules in 5/9 it appears that if either operand is unsigned, the other will be converted to unsigned automatically so you don't even need to cast the -1 (if I'm reading the standard correctly). In the case of unsigned short you'll need a direct check or explicit cast because of the automatic integral promotion induced by the ==.

using #include <limits.h> you could just do
if(foo == UINT_MAX)
if foo is an unsigned int it has valued [0 - +4,294,967,295] (if 32 bit)
More : http://en.wikipedia.org/wiki/Limits.h
edit: in C
if you do
#include <limits.h>
#include <stdio.h>
int main() {
unsigned int x = -1;
printf("%u",x);
return 0;
}
you will get the result 4294967295 (in a 32-bit system) and that is because internally, -1 is represented by 11111111111111111111111111111111 in two's complement. But because it is an unsigned, there is now no "sign bit" therefore making it work in the range [0-2^n]
Also see : http://en.wikipedia.org/wiki/Two%27s_complement
See other's answers for the C++ part std::numeric_limits<unsigned int>::max()

I would define a constant that would hold the maximum value as needed by the design of your code. Using "-1" is confusing. Imagine that someone in the future will change the type from unsigned int to int, it will mess your code.

Here's an attempt at doing this in C. It depends on the implementation not having padding bits:
#define IS_MAX_UNSIGNED(x) ( (sizeof(x)>=sizeof(int)) ? ((x)==-1) : \
((x)==(1<<CHAR_BIT*sizeof(x))-1) )
Or, if you can modify the variable, just do something like:
if (!(x++,x--)) { /* x is at max possible value */ }
Edit: And if you don't care about possible implementation-defined extended integer types:
#define IS_MAX_UNSIGNED(x) ( (sizeof(x)>=sizeof(int)) ? ((x)==-1) : \
(sizeof(x)==sizeof(short)) ? ((x)==USHRT_MAX) : \
(sizeof(x)==1 ? ((x)==UCHAR_MAX) : 42 )
You could use sizeof(char) in the last line, of course, but I consider it a code smell and would typically catch it grepping for code smells, so I just wrote 1. Of course you could also just remove the last conditional entirely.

Related

Vector size comparison with integer for non-zero vector size [duplicate]

See this code snippet
int main()
{
unsigned int a = 1000;
int b = -1;
if (a>b) printf("A is BIG! %d\n", a-b);
else printf("a is SMALL! %d\n", a-b);
return 0;
}
This gives the output: a is SMALL: 1001
I don't understand what's happening here. How does the > operator work here? Why is "a" smaller than "b"? If it is indeed smaller, why do i get a positive number (1001) as the difference?
Binary operations between different integral types are performed within a "common" type defined by so called usual arithmetic conversions (see the language specification, 6.3.1.8). In your case the "common" type is unsigned int. This means that int operand (your b) will get converted to unsigned int before the comparison, as well as for the purpose of performing subtraction.
When -1 is converted to unsigned int the result is the maximal possible unsigned int value (same as UINT_MAX). Needless to say, it is going to be greater than your unsigned 1000 value, meaning that a > b is indeed false and a is indeed small compared to (unsigned) b. The if in your code should resolve to else branch, which is what you observed in your experiment.
The same conversion rules apply to subtraction. Your a-b is really interpreted as a - (unsigned) b and the result has type unsigned int. Such value cannot be printed with %d format specifier, since %d only works with signed values. Your attempt to print it with %d results in undefined behavior, so the value that you see printed (even though it has a logical deterministic explanation in practice) is completely meaningless from the point of view of C language.
Edit: Actually, I could be wrong about the undefined behavior part. According to C language specification, the common part of the range of the corresponding signed and unsigned integer type shall have identical representation (implying, according to the footnote 31, "interchangeability as arguments to functions"). So, the result of a - b expression is unsigned 1001 as described above, and unless I'm missing something, it is legal to print this specific unsigned value with %d specifier, since it falls within the positive range of int. Printing (unsigned) INT_MAX + 1 with %d would be undefined, but 1001u is fine.
On a typical implementation where int is 32-bit, -1 when converted to an unsigned int is 4,294,967,295 which is indeed ≥ 1000.
Even if you treat the subtraction in an unsigned world, 1000 - (4,294,967,295) = -4,294,966,295 = 1,001 which is what you get.
That's why gcc will spit a warning when you compare unsigned with signed. (If you don't see a warning, pass the -Wsign-compare flag.)
You are doing unsigned comparison, i.e. comparing 1000 to 2^32 - 1.
The output is signed because of %d in printf.
N.B. sometimes the behavior when you mix signed and unsigned operands is compiler-specific. I think it's best to avoid them and do casts when in doubt.
#include<stdio.h>
int main()
{
int a = 1000;
signed int b = -1, c = -2;
printf("%d",(unsigned int)b);
printf("%d\n",(unsigned int)c);
printf("%d\n",(unsigned int)a);
if(1000>-1){
printf("\ntrue");
}
else
printf("\nfalse");
return 0;
}
For this you need to understand the precedence of operators
Relational Operators works left to right ...
so when it comes
if(1000>-1)
then first of all it will change -1 to unsigned integer because int is by default treated as unsigned number and it range it greater than the signed number
-1 will change into the unsigned number ,it changes into a very big number
Find a easy way to compare, maybe useful when you can not get rid of unsigned declaration, (for example, [NSArray count]), just force the "unsigned int" to an "int".
Please correct me if I am wrong.
if (((int)a)>b) {
....
}
The hardware is designed to compare signed to signed and unsigned to unsigned.
If you want the arithmetic result, convert the unsigned value to a larger signed type first. Otherwise the compiler wil assume that the comparison is really between unsigned values.
And -1 is represented as 1111..1111, so it a very big quantity ... The biggest ... When interpreted as unsigned.
while comparing a>b where a is unsigned int type and b is int type, b is type casted to unsigned int so, signed int value -1 is converted into MAX value of unsigned**(range: 0 to (2^32)-1 )**
Thus, a>b i.e., (1000>4294967296) becomes false. Hence else loop printf("a is SMALL! %d\n", a-b); executed.

Finding SHRT_MAX on systems without <limits.h> or <values.h>

I am reading The C++ Answer Book by Tony L Hansen. It says somewhere that the value of SHRT_MAX (the largest value of a short) can be derived as follows:
const CHAR_BIT= 8;
#define BITS(type) (CHAR_BIT*(int)sizeof(type))
#define HIBIT(type) ((type)(1<< (BITS(type)-1)))
#define TYPE_MAX(type) ((type)~HIBIT(type));
const SHRT_MAX= TYPE_MAX(short);
Could someone explain in simple words what is happening in the above 5 lines?
const CHAR_BIT= 8;
Assuming int is added here (and below): CHAR_BIT is the number of bits in a char. Its value is assumed here without checking.
#define BITS(type) (CHAR_BIT*(int)sizeof(type))
BITS(type) is the number of bits in type. If sizeof(short) == 2, then BITS(short) is 8*2.
Note that C++ does not guarantee that all bits in integer types other than char contribute to the value, but the below will assume that nonetheless.
#define HIBIT(type) ((type)(1<< (BITS(type)-1)))
If BITS(short) == 16, then HIBIT(short) is ((short)(1<<15)). This is implementation-dependent, but assumed to have the sign bit set, and all value bits zero.
#define TYPE_MAX(type) ((type)~HIBIT(type));
If HIBIT(short) is (short)32768, then TYPE_MAX(short) is (short)~(short)32768. This is assumed to have the sign bit cleared, and all value bits set.
const SHRT_MAX= TYPE_MAX(short);
If all assumptions are met, if this indeed has all value bits set, but not the sign bit, then this is the highest value representable in short.
It's possible to get the maximum value more reliably in modern C++ when you know that:
the maximum value for an unsigned type is trivially obtainable
the maximum value for a signed type is assuredly either equal to the maximum value of the corresponding unsigned type, or that value right-shifted until it's in the signed type's range
a conversion of an out-of-range value to a signed type does not have undefined behaviour, but instead gives an implementation-defined value in the signed type's range:
template <typename S, typename U>
constexpr S get_max_value(U u) {
S s = u;
while (s < 0 || s != u)
s = u >>= 1;
return u;
}
constexpr unsigned short USHRT_MAX = -1;
constexpr short SHRT_MAX = get_max_value<short>(USHRT_MAX);
Reformatting a bit:
const CHAR_BIT = 8;
Invalid code in C++, it looks like old C code. Let's assume that const int was meant.
#define BITS(type) (CHAR_BIT * (int)sizeof(type))
Returns the number of bits that a type takes assuming 8-bit bytes, because sizeof returns the number of bytes of the object representation of type.
#define HIBIT(type) ((type) (1 << (BITS(type) - 1)))
Assuming type is a signed integer in two's complement, this would return an integer of that type with the highest bit set. For instance, for a 8-bit integer, you would get 1 << (8 - 1) == 1 << 7 == 0b10000000 == -1.
#define TYPE_MAX(type) ((type) ~HIBIT(type));
The bitwise not of the previous thing, i.e. flips each bit. Following the same example as before, you would get ~0b10000000 == 0b01111111 == 127.
const SHRT_MAX = TYPE_MAX(short);
Again invalid, both in C and C++. In C++ due to the missing int, in C due to the fact that CHAR_BIT is not a constant expression. Let's assume const int. Uses the previous code to get the maximum of the short type.
Taking it one line at a time:
const CHAR_BIT= 8;
Declare and initialize CHAR_BIT as a variable of type const int with
value 8. This works because int is the default type (wrong: see comments below), though it’s
better practice to specify the type.
#define BITS(type) (CHAR_BIT* (int)sizeof(type))
Preprocessor macro, converting a type to the number of bits in that
type. (The asterisk isn’t making anything a pointer, it’s for
multiplication. Would be clearer if the author had put a space before
it.)
#define HIBIT(type) ((type)(1<< (BITS(type)-1)))
Macro, converting a type to a number of that type with the highest bit
set to one and all other bits zero.
#define TYPE_MAX(type) ((type)~HIBIT(type));
Macro, inverting HIBIT so the highest bit is zero and all others are
one. This will be the maximum value of type if it’s a signed type and
the machine uses two’s complement. The semicolon shouldn’t be there, but
it will work in this code.
const SHRT_MAX= TYPE_MAX(short);
Uses the above macros to compute the maximum value of a short.

Is `-1` correct for using as maximum value of an unsigned integer?

Is there any c++ standard paragraph which says that using -1 for this is portable and correct way or the only way of doing this correctly is using predefined values?
I have had a conversation with my colleague, what is better: using -1 for a maximum unsigned integer number or using a value from limits.h or std::numeric_limits ?
I have told my colleague that using predefined maximum values from limits.h or std::numeric_limits is the portable and clean way of doing this, however, the colleague objected to -1 being as same portable as numeric limits, and more, it has one more advantage:
unsigned short i = -1; // unsigned short max
can easily be changed to any other type, like
unsigned long i = -1; // unsigned long max
when using the predefined value from the limits.h header file or std::numeric_limits also requires to rewrite it too along with the type to the left.
Regarding conversions of integers, C 2011 [draft N1570] 6.3.1.3 2 says
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Thus, converting -1 to an unsigned integer type necessarily produces the maximum value of that type.
There may be issues with using -1 in various contexts where it is not immediately converted to the desired type. If it is immediately converted to the desired unsigned integer type, as by assignment or explicit conversion, then the result is clear. However, if it is a part of an expression, its type is int, and it behaves like an int until converted. In contrast, UINT_MAX has the type unsigned int, so it behaves like an unsigned int.
As chux points out in a comment, USHRT_MAX effectively has a type of int, so even the named limits are not fully safe from type issues.
Not using the standard way or not clearly showing the intent is often a bad idea that we pay later
I would suggest:
auto i = std::numeric_limits<unsigned int>::max();
or #jamesdin suggested a certainly better one, closer to the C
habits:
unsigned int i = std::numeric_limits<decltype(i)>::max();
Your colleague argument is not admissible. Changing int -> long int, as bellow:
auto i = std::numeric_limits<unsigned long int>::max();
does not require extra work compared to the -1 solution (thanks to the use of auto).
the '-1' solution does not directly reflect our intent, hence it possibly has harmful consequences. Consider this code snippet:
.
using index_t = unsigned int;
... now in another file (or far away from the previous line) ...
const index_t max_index = -1;
First, we do not understand why max_index is -1.
Worst, if someone wants to improve the code and define
using index_t = ptrdiff_t;
=> then the statement max_index=-1 is not the max anymore and you get a buggy code. Again this can not happen with something like:
const index_t max_index = std::numeric_limits<index_t>::max();
CAVEAT: nevertheless there is a caveat when using std::numeric_limits. It has nothing to do with integers, but is related to floating point numbers.
std::cout << "\ndouble lowest: "
<< std::numeric_limits<double>::lowest()
<< "\ndouble min : "
<< std::numeric_limits<double>::min() << '\n';
prints:
double lowest: -1.79769e+308
double min : 2.22507e-308 <-- maybe you expected -1.79769e+308 here!
min returns the smallest finite value of the given type
lowest returns the lowest finite value of the given type
Always interesting to remember that, as it can be a source of bug if we do not pay attention to (using min instead of lowest).
Is -1 correct for using as maximum value of an unsigned integer?
Yes, it is functionally correct when used as a direct assignment/initialization. Yet often looks questionable #Ron.
Constants from limits.h or std::numeric_limits convey more code understanding, yet need maintenance should the type of i change.
[Note] OP later drop the C tag.
To add an alternative to assigning a maximum value (available in C11) that helps reduce code maintenance:
Use the loved/hated _Generic
#define info_max(X) _Generic((X), \
long double: LDBL_MAX, \
double: DBL_MAX, \
float: FLT_MAX, \
unsigned long long: ULLONG_MAX, \
long long: LLONG_MAX, \
unsigned long: ULONG_MAX, \
long: LONG_MAX, \
unsigned: UINT_MAX, \
int: INT_MAX, \
unsigned short: USHRT_MAX, \
short: SHRT_MAX, \
unsigned char: UCHAR_MAX, \
signed char: SCHAR_MAX, \
char: CHAR_MAX, \
_Bool: 1, \
default: 1/0 \
)
int main() {
...
some_basic_type i = info_max(i);
...
}
The above macro info_max() have limitations concerning types like size_t, intmax_t, etc. that may not be enumerated in the above list. There are more complex macros that can cope with that. The idea here is illustrative.
The technical side has been covered by other answers; and while you focus on technical correctness in your question, pointing out the cleanness aspect again is important, because imo that’s the much more important point.
The major reason why it is a bad idea to use that particular trickery is: The code is ambiguous. It is unclear whether someone used the unsigned trickery intentionally or made a mistake and actually wanted to initialize a signed variable to -1. Should your colleague mention a comment after you present this argument, tell him to stop being silly. :)
I’m actually slightly baffled that someone would even consider this trick in earnest. There’s an unambigous, intuitive and idiomatic way to set a value to its max in C: the _MAX macros. And there’s an additional, equally unambigous, intuitive and idiomatic way in C++ that provides some more type safety: numeric_limits. That -1 trick is a classic case of being clever.
The C++ standard says this about signed to unsigned conversions ([conv.integral]/2):
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo
2n where n is the number of bits used to represent the unsigned type). [ Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
So yes, converting -1 to an n-bit unsigned integer will always give you 2n-1, regardless of which signed integer type the -1 started as.
Whether or not unsigned x = -1; is more or less readable than unsigned x = UINT_MAX; though is another discussion (there's definitely the chance that it'll raise some eyebrows, maybe even your own when you look at your own code later;).

Efficient unsigned-to-signed cast avoiding implementation-defined behavior

I want to define a function that takes an unsigned int as argument and returns an int congruent modulo UINT_MAX+1 to the argument.
A first attempt might look like this:
int unsigned_to_signed(unsigned n)
{
return static_cast<int>(n);
}
But as any language lawyer knows, casting from unsigned to signed for values larger than INT_MAX is implementation-defined.
I want to implement this such that (a) it only relies on behavior mandated by the spec; and (b) it compiles into a no-op on any modern machine and optimizing compiler.
As for bizarre machines... If there is no signed int congruent modulo UINT_MAX+1 to the unsigned int, let's say I want to throw an exception. If there is more than one (I am not sure this is possible), let's say I want the largest one.
OK, second attempt:
int unsigned_to_signed(unsigned n)
{
int int_n = static_cast<int>(n);
if (n == static_cast<unsigned>(int_n))
return int_n;
// else do something long and complicated
}
I do not much care about the efficiency when I am not on a typical twos-complement system, since in my humble opinion that is unlikely. And if my code becomes a bottleneck on the omnipresent sign-magnitude systems of 2050, well, I bet someone can figure that out and optimize it then.
Now, this second attempt is pretty close to what I want. Although the cast to int is implementation-defined for some inputs, the cast back to unsigned is guaranteed by the standard to preserve the value modulo UINT_MAX+1. So the conditional does check exactly what I want, and it will compile into nothing on any system I am likely to encounter.
However... I am still casting to int without first checking whether it will invoke implementation-defined behavior. On some hypothetical system in 2050 it could do who-knows-what. So let's say I want to avoid that.
Question: What should my "third attempt" look like?
To recap, I want to:
Cast from unsigned int to signed int
Preserve the value mod UINT_MAX+1
Invoke only standard-mandated behavior
Compile into a no-op on a typical twos-complement machine with optimizing compiler
[Update]
Let me give an example to show why this is not a trivial question.
Consider a hypothetical C++ implementation with the following properties:
sizeof(int) equals 4
sizeof(unsigned) equals 4
INT_MAX equals 32767
INT_MIN equals -232 + 32768
UINT_MAX equals 232 - 1
Arithmetic on int is modulo 232 (into the range INT_MIN through INT_MAX)
std::numeric_limits<int>::is_modulo is true
Casting unsigned n to int preserves the value for 0 <= n <= 32767 and yields zero otherwise
On this hypothetical implementation, there is exactly one int value congruent (mod UINT_MAX+1) to each unsigned value. So my question would be well-defined.
I claim that this hypothetical C++ implementation fully conforms to the C++98, C++03, and C++11 specifications. I admit I have not memorized every word of all of them... But I believe I have read the relevant sections carefully. So if you want me to accept your answer, you either must (a) cite a spec that rules out this hypothetical implementation or (b) handle it correctly.
Indeed, a correct answer must handle every hypothetical implementation permitted by the standard. That is what "invoke only standard-mandated behavior" means, by definition.
Incidentally, note that std::numeric_limits<int>::is_modulo is utterly useless here for multiple reasons. For one thing, it can be true even if unsigned-to-signed casts do not work for large unsigned values. For another, it can be true even on one's-complement or sign-magnitude systems, if arithmetic is simply modulo the entire integer range. And so on. If your answer depends on is_modulo, it's wrong.
[Update 2]
hvd's answer taught me something: My hypothetical C++ implementation for integers is not permitted by modern C. The C99 and C11 standards are very specific about the representation of signed integers; indeed, they only permit twos-complement, ones-complement, and sign-magnitude (section 6.2.6.2 paragraph (2); ).
But C++ is not C. As it turns out, this fact lies at the very heart of my question.
The original C++98 standard was based on the much older C89, which says (section 3.1.2.5):
For each of the signed integer types, there is a corresponding (but
different) unsigned integer type (designated with the keyword
unsigned) that uses the same amount of storage (including sign
information) and has the same alignment requirements. The range of
nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type, and the representation of the
same value in each type is the same.
C89 says nothing about only having one sign bit or only allowing twos-complement/ones-complement/sign-magnitude.
The C++98 standard adopted this language nearly verbatim (section 3.9.1 paragraph (3)):
For each of the signed integer types, there exists a corresponding
(but different) unsigned integer type: "unsigned char", "unsigned
short int", "unsigned int", and "unsigned long int", each of
which occupies the same amount of storage and has the same alignment
requirements (3.9) as the corresponding signed integer type ; that
is, each signed integer type has the same object representation as
its corresponding unsigned integer type. The range of nonnegative
values of a signed integer type is a subrange of the corresponding
unsigned integer type, and the value representation of each
corresponding signed/unsigned type shall be the same.
The C++03 standard uses essentially identical language, as does C++11.
No standard C++ spec constrains its signed integer representations to any C spec, as far as I can tell. And there is nothing mandating a single sign bit or anything of the kind. All it says is that non-negative signed integers must be a subrange of the corresponding unsigned.
So, again I claim that INT_MAX=32767 with INT_MIN=-232+32768 is permitted. If your answer assumes otherwise, it is incorrect unless you cite a C++ standard proving me wrong.
Expanding on user71404's answer:
int f(unsigned x)
{
if (x <= INT_MAX)
return static_cast<int>(x);
if (x >= INT_MIN)
return static_cast<int>(x - INT_MIN) + INT_MIN;
throw x; // Or whatever else you like
}
If x >= INT_MIN (keep the promotion rules in mind, INT_MIN gets converted to unsigned), then x - INT_MIN <= INT_MAX, so this won't have any overflow.
If that is not obvious, take a look at the claim "If x >= -4u, then x + 4 <= 3.", and keep in mind that INT_MAX will be equal to at least the mathematical value of -INT_MIN - 1.
On the most common systems, where !(x <= INT_MAX) implies x >= INT_MIN, the optimizer should be able (and on my system, is able) to remove the second check, determine that the two return statements can be compiled to the same code, and remove the first check too. Generated assembly listing:
__Z1fj:
LFB6:
.cfi_startproc
movl 4(%esp), %eax
ret
.cfi_endproc
The hypothetical implementation in your question:
INT_MAX equals 32767
INT_MIN equals -232 + 32768
is not possible, so does not need special consideration. INT_MIN will be equal to either -INT_MAX, or to -INT_MAX - 1. This follows from C's representation of integer types (6.2.6.2), which requires n bits to be value bits, one bit to be a sign bit, and only allows one single trap representation (not including representations that are invalid because of padding bits), namely the one that would otherwise represent negative zero / -INT_MAX - 1. C++ doesn't allow any integer representations beyond what C allows.
Update: Microsoft's compiler apparently does not notice that x > 10 and x >= 11 test the same thing. It only generates the desired code if x >= INT_MIN is replaced with x > INT_MIN - 1u, which it can detect as the negation of x <= INT_MAX (on this platform).
[Update from questioner (Nemo), elaborating on our discussion below]
I now believe this answer works in all cases, but for complicated reasons. I am likely to award the bounty to this solution, but I want to capture all the gory details in case anybody cares.
Let's start with C++11, section 18.3.3:
Table 31 describes the header <climits>.
...
The contents are the same as the Standard C library header <limits.h>.
Here, "Standard C" means C99, whose specification severely constrains the representation of signed integers. They are just like unsigned integers, but with one bit dedicated to "sign" and zero or more bits dedicated to "padding". The padding bits do not contribute to the value of the integer, and the sign bit contributes only as twos-complement, ones-complement, or sign-magnitude.
Since C++11 inherits the <climits> macros from C99, INT_MIN is either -INT_MAX or -INT_MAX-1, and hvd's code is guaranteed to work. (Note that, due to the padding, INT_MAX could be much less than UINT_MAX/2... But thanks to the way signed->unsigned casts work, this answer handles that fine.)
C++03/C++98 is trickier. It uses the same wording to inherit <climits> from "Standard C", but now "Standard C" means C89/C90.
All of these -- C++98, C++03, C89/C90 -- have the wording I give in my question, but also include this (C++03 section 3.9.1 paragraph 7):
The representations of integral types shall define values by use of a
pure binary numeration system.(44) [Example: this International
Standard permits 2’s complement, 1’s complement and signed magnitude
representations for integral types.]
Footnote (44) defines "pure binary numeration system":
A positional representation for integers that uses the binary digits 0
and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.
What is interesting about this wording is that it contradicts itself, because the definition of "pure binary numeration system" does not permit a sign/magnitude representation! It does allow the high bit to have, say, the value -2n-1 (twos complement) or -(2n-1-1) (ones complement). But there is no value for the high bit that results in sign/magnitude.
Anyway, my "hypothetical implementation" does not qualify as "pure binary" under this definition, so it is ruled out.
However, the fact that the high bit is special means we can imagine it contributing any value at all: A small positive value, huge positive value, small negative value, or huge negative value. (If the sign bit can contribute -(2n-1-1), why not -(2n-1-2)? etc.)
So, let's imagine a signed integer representation that assigns a wacky value to the "sign" bit.
A small positive value for the sign bit would result in a positive range for int (possibly as large as unsigned), and hvd's code handles that just fine.
A huge positive value for the sign bit would result in int having a maximum larger than unsigned, which is is forbidden.
A huge negative value for the sign bit would result in int representing a non-contiguous range of values, and other wording in the spec rules that out.
Finally, how about a sign bit that contributes a small negative quantity? Could we have a 1 in the "sign bit" contribute, say, -37 to the value of the int? So then INT_MAX would be (say) 231-1 and INT_MIN would be -37?
This would result in some numbers having two representations... But ones-complement gives two representations to zero, and that is allowed according to the "Example". Nowhere does the spec say that zero is the only integer that might have two representations. So I think this new hypothetical is allowed by the spec.
Indeed, any negative value from -1 down to -INT_MAX-1 appears to be permissible as a value for the "sign bit", but nothing smaller (lest the range be non-contiguous). In other words, INT_MIN might be anything from -INT_MAX-1 to -1.
Now, guess what? For the second cast in hvd's code to avoid implementation-defined behavior, we just need x - (unsigned)INT_MIN less than or equal to INT_MAX. We just showed INT_MIN is at least -INT_MAX-1. Obviously, x is at most UINT_MAX. Casting a negative number to unsigned is the same as adding UINT_MAX+1. Put it all together:
x - (unsigned)INT_MIN <= INT_MAX
if and only if
UINT_MAX - (INT_MIN + UINT_MAX + 1) <= INT_MAX
-INT_MIN-1 <= INT_MAX
-INT_MIN <= INT_MAX+1
INT_MIN >= -INT_MAX-1
That last is what we just showed, so even in this perverse case, the code actually works.
That exhausts all of the possibilities, thus ending this extremely academic exercise.
Bottom line: There is some seriously under-specified behavior for signed integers in C89/C90 that got inherited by C++98/C++03. It is fixed in C99, and C++11 indirectly inherits the fix by incorporating <limits.h> from C99. But even C++11 retains the self-contradictory "pure binary representation" wording...
This code relies only on behavior, mandated by the spec, so requirement (a) is easily satisfied:
int unsigned_to_signed(unsigned n)
{
int result = INT_MAX;
if (n > INT_MAX && n < INT_MIN)
throw runtime_error("no signed int for this number");
for (unsigned i = INT_MAX; i != n; --i)
--result;
return result;
}
It's not so easy with requirement (b). This compiles into a no-op with gcc 4.6.3 (-Os, -O2, -O3) and with clang 3.0 (-Os, -O, -O2, -O3). Intel 12.1.0 refuses to optimize this. And I have no info about Visual C.
The original answer solved the problem only for unsigned => int. What if we want to solve the general problem of "some unsigned type" to its corresponding signed type? Furthermore, the original answer was excellent at citing sections of the standard and analyzing some corner cases, but it did not really help me get a feel for why it worked, so this answer will try to give a strong conceptual basis. This answer will try to help explain "why", and use modern C++ features to try to simplify the code.
C++20 answer
The problem has simplified dramatically with P0907: Signed Integers are Two’s Complement and the final wording P1236 that was voted into the C++20 standard. Now, the answer is as simple as possible:
template<std::unsigned_integral T>
constexpr auto cast_to_signed_integer(T const value) {
return static_cast<std::make_signed_t<T>>(value);
}
That's it. A static_cast (or C-style cast) is finally guaranteed to do the thing you need for this question, and the thing many programmers thought it always did.
C++17 answer
In C++17, things are much more complicated. We have to deal with three possible integer representations (two's complement, ones' complement, and sign-magnitude). Even in the case where we know it must be two's complement because we checked the range of possible values, the conversion of a value outside the range of the signed integer to that signed integer still gives us an implementation-defined result. We have to use tricks like we have seen in other answers.
First, here is the code for how to solve the problem generically:
template<typename T, typename = std::enable_if_t<std::is_unsigned_v<T>>>
constexpr auto cast_to_signed_integer(T const value) {
using result = std::make_signed_t<T>;
using result_limits = std::numeric_limits<result>;
if constexpr (result_limits::min() + 1 != -result_limits::max()) {
if (value == static_cast<T>(result_limits::max()) + 1) {
throw std::runtime_error("Cannot convert the maximum possible unsigned to a signed value on this system");
}
}
if (value <= result_limits::max()) {
return static_cast<result>(value);
} else {
using promoted_unsigned = std::conditional_t<sizeof(T) <= sizeof(unsigned), unsigned, T>;
using promoted_signed = std::make_signed_t<promoted_unsigned>;
constexpr auto shift_by_window = [](auto x) {
// static_cast to avoid conversion warning
return x - static_cast<decltype(x)>(result_limits::max()) - 1;
};
return static_cast<result>(
shift_by_window( // shift values from common range to negative range
static_cast<promoted_signed>(
shift_by_window( // shift large values into common range
static_cast<promoted_unsigned>(value) // cast to avoid promotion to int
)
)
)
);
}
}
This has a few more casts than the accepted answer, and that is to ensure there are no signed / unsigned mismatch warnings from your compiler and to properly handle integer promotion rules.
We first have a special case for systems that are not two's complement (and thus we must handle the maximum possible value specially because it doesn't have anything to map to). After that, we get to the real algorithm.
The second top-level condition is straightforward: we know the value is less than or equal to the maximum value, so it fits in the result type. The third condition is a little more complicated even with the comments, so some examples would probably help understand why each statement is necessary.
Conceptual basis: the number line
First, what is this window concept? Consider the following number line:
| signed |
<.........................>
| unsigned |
It turns out that for two's complement integers, you can divide the subset of the number line that can be reached by either type into three equally sized categories:
- => signed only
= => both
+ => unsigned only
<..-------=======+++++++..>
This can be easily proven by considering the representation. An unsigned integer starts at 0 and uses all of the bits to increase the value in powers of 2. A signed integer is exactly the same for all of the bits except the sign bit, which is worth -(2^position) instead of 2^position. This means that for all n - 1 bits, they represent the same values. Then, unsigned integers have one more normal bit, which doubles the total number of values (in other words, there are just as many values with that bit set as without it set). The same logic holds for signed integers, except that all the values with that bit set are negative.
The other two legal integer representations, ones' complement and sign-magnitude, have all of the same values as two's complement integers except for one: the most negative value. C++ defines everything about integer types, except for reinterpret_cast (and the C++20 std::bit_cast), in terms of the range of representable values, not in terms of the bit representation. This means that our analysis will hold for each of these three representations as long as we do not ever try to create the trap representation. The unsigned value that would map to this missing value is a rather unfortunate one: the one right in the middle of the unsigned values. Fortunately, our first condition checks (at compile time) whether such a representation exists, and then handles it specially with a runtime check.
The first condition handles the case where we are in the = section, which means that we are in the overlapping region where the values in one can be represented in the other without change. The shift_by_window function in the code moves all values down by the size of each of these segments (we have to subtract the max value then subtract 1 to avoid arithmetic overflow issues). If we are outside of that region (we are in the + region), we need to jump down by one window size. This puts us in the overlapping range, which means we can safely convert from unsigned to signed because there is no change in value. However, we are not done yet because we have mapped two unsigned values to each signed value. Therefore, we need to shift down to the next window (the - region) so that we have a unique mapping again.
Now, does this give us a result congruent mod UINT_MAX + 1, as requested in the question? UINT_MAX + 1 is equivalent to 2^n, where n is the number of bits in the value representation. The value we use for our window size is equal to 2^(n - 1) (the final index in a sequence of values is one less than the size). We subtract that value twice, which means we subtract 2 * 2^(n - 1) which is equal to 2^n. Adding and subtracting x is a no-op in arithmetic mod x, so we have not affected the original value mod 2^n.
Properly handling integer promotions
Because this is a generic function and not just int and unsigned, we also have to concern ourselves with integral promotion rules. There are two possibly interesting cases: one in which short is smaller than int and one in which short is the same size as int.
Example: short smaller than int
If short is smaller than int (common on modern platforms) then we also know that unsigned short can fit in an int, which means that any operations on it will actually happen in int, so we explicitly cast to the promoted type to avoid this. Our final statement is pretty abstract and becomes easier to understand if we substitute in real values. For our first interesting case, with no loss of generality let us consider a 16-bit short and a 17-bit int (which is still allowed under the new rules, and would just mean that at least one of those two integer types have some padding bits):
constexpr auto shift_by_window = [](auto x) {
return x - static_cast<decltype(x)>(32767) - 1;
};
return static_cast<int16_t>(
shift_by_window(
static_cast<int17_t>(
shift_by_window(
static_cast<uint17_t>(value)
)
)
)
);
Solving for the greatest possible 16-bit unsigned value
constexpr auto shift_by_window = [](auto x) {
return x - static_cast<decltype(x)>(32767) - 1;
};
return int16_t(
shift_by_window(
int17_t(
shift_by_window(
uint17_t(65535)
)
)
)
);
Simplifies to
return int16_t(
int17_t(
uint17_t(65535) - uint17_t(32767) - 1
) -
int17_t(32767) -
1
);
Simplifies to
return int16_t(
int17_t(uint17_t(32767)) -
int17_t(32767) -
1
);
Simplifies to
return int16_t(
int17_t(32767) -
int17_t(32767) -
1
);
Simplifies to
return int16_t(-1);
We put in the largest possible unsigned and get back -1, success!
Example: short same size as int
If short is the same size as int (uncommon on modern platforms), the integral promotion rule are slightly different. In this case, short promotes to int and unsigned short promotes to unsigned. Fortunately, we explicitly cast each result to the type we want to do the calculation in, so we end up with no problematic promotions. With no loss of generality let us consider a 16-bit short and a 16-bit int:
constexpr auto shift_by_window = [](auto x) {
return x - static_cast<decltype(x)>(32767) - 1;
};
return static_cast<int16_t>(
shift_by_window(
static_cast<int16_t>(
shift_by_window(
static_cast<uint16_t>(value)
)
)
)
);
Solving for the greatest possible 16-bit unsigned value
auto x = int16_t(
uint16_t(65535) - uint16_t(32767) - 1
);
return int16_t(
x - int16_t(32767) - 1
);
Simplifies to
return int16_t(
int16_t(32767) - int16_t(32767) - 1
);
Simplifies to
return int16_t(-1);
We put in the largest possible unsigned and get back -1, success!
What if I just care about int and unsigned and don't care about warnings, like the original question?
constexpr int cast_to_signed_integer(unsigned const value) {
using result_limits = std::numeric_limits<int>;
if constexpr (result_limits::min() + 1 != -result_limits::max()) {
if (value == static_cast<unsigned>(result_limits::max()) + 1) {
throw std::runtime_error("Cannot convert the maximum possible unsigned to a signed value on this system");
}
}
if (value <= result_limits::max()) {
return static_cast<int>(value);
} else {
constexpr int window = result_limits::min();
return static_cast<int>(value + window) + window;
}
}
See it live
https://godbolt.org/z/74hY81
Here we see that clang, gcc, and icc generate no code for cast and cast_to_signed_integer_basic at -O2 and -O3, and MSVC generates no code at /O2, so the solution is optimal.
You can explicitly tell the compiler what you want to do:
int unsigned_to_signed(unsigned n) {
if (n > INT_MAX) {
if (n <= UINT_MAX + INT_MIN) {
throw "no result";
}
return static_cast<int>(n + INT_MIN) - (UINT_MAX + INT_MIN + 1);
} else {
return static_cast<int>(n);
}
}
Compiles with gcc 4.7.2 for x86_64-linux (g++ -O -S test.cpp) to
_Z18unsigned_to_signedj:
movl %edi, %eax
ret
If x is our input...
If x > INT_MAX, we want to find a constant k such that 0 < x - k*INT_MAX < INT_MAX.
This is easy -- unsigned int k = x / INT_MAX;. Then, let unsigned int x2 = x - k*INT_MAX;
We can now cast x2 to int safely. Let int x3 = static_cast<int>(x2);
We now want to subtract something like UINT_MAX - k * INT_MAX + 1 from x3, if k > 0.
Now, on a 2s complement system, so long as x > INT_MAX, this works out to:
unsigned int k = x / INT_MAX;
x -= k*INT_MAX;
int r = int(x);
r += k*INT_MAX;
r -= UINT_MAX+1;
Note that UINT_MAX+1 is zero in C++ guaranteed, the conversion to int was a noop, and we subtracted k*INT_MAX then added it back on "the same value". So an acceptable optimizer should be able to erase all that tomfoolery!
That leaves the problem of x > INT_MAX or not. Well, we create 2 branches, one with x > INT_MAX, and one without. The one without does a strait cast, which the compiler optimizes to a noop. The one with ... does a noop after the optimizer is done. The smart optimizer realizes both branches to the same thing, and drops the branch.
Issues: if UINT_MAX is really large relative to INT_MAX, the above might not work. I am assuming that k*INT_MAX <= UINT_MAX+1 implicitly.
We could probably attack this with some enums like:
enum { divisor = UINT_MAX/INT_MAX, remainder = UINT_MAX-divisor*INT_MAX };
which work out to 2 and 1 on a 2s complement system I believe (are we guaranteed for that math to work? That's tricky...), and do logic based on these that easily optimize away on non-2s complement systems...
This also opens up the exception case. It is only possible if UINT_MAX is much larger than (INT_MIN-INT_MAX), so you can put your exception code in an if block asking exactly that question somehow, and it won't slow you down on a traditional system.
I'm not exactly sure how to construct those compile-time constants to deal correctly with that.
std::numeric_limits<int>::is_modulo is a compile time constant. so you can use it for template specialization. problem solved, at least if compiler plays along with inlining.
#include <limits>
#include <stdexcept>
#include <string>
#ifdef TESTING_SF
bool const testing_sf = true;
#else
bool const testing_sf = false;
#endif
// C++ "extensions"
namespace cppx {
using std::runtime_error;
using std::string;
inline bool hopefully( bool const c ) { return c; }
inline bool throw_x( string const& s ) { throw runtime_error( s ); }
} // namespace cppx
// C++ "portability perversions"
namespace cppp {
using cppx::hopefully;
using cppx::throw_x;
using std::numeric_limits;
namespace detail {
template< bool isTwosComplement >
int signed_from( unsigned const n )
{
if( n <= unsigned( numeric_limits<int>::max() ) )
{
return static_cast<int>( n );
}
unsigned const u_max = unsigned( -1 );
unsigned const u_half = u_max/2 + 1;
if( n == u_half )
{
throw_x( "signed_from: unsupported value (negative max)" );
}
int const i_quarter = static_cast<int>( u_half/2 );
int const int_n1 = static_cast<int>( n - u_half );
int const int_n2 = int_n1 - i_quarter;
int const int_n3 = int_n2 - i_quarter;
hopefully( n == static_cast<unsigned>( int_n3 ) )
|| throw_x( "signed_from: range error" );
return int_n3;
}
template<>
inline int signed_from<true>( unsigned const n )
{
return static_cast<int>( n );
}
} // namespace detail
inline int signed_from( unsigned const n )
{
bool const is_modulo = numeric_limits< int >::is_modulo;
return detail::signed_from< is_modulo && !testing_sf >( n );
}
} // namespace cppp
#include <iostream>
using namespace std;
int main()
{
int const x = cppp::signed_from( -42u );
wcout << x << endl;
}
EDIT: Fixed up code to avoid possible trap on non-modular-int machines (only one is known to exist, namely the archaically configured versions of the Unisys Clearpath). For simplicity this is done by not supporting the value -2n-1 where n is the number of int value bits, on such machine (i.e., on the Clearpath). in practice this value will not be supported by the machine either (i.e., with sign-and-magnitude or 1’s complement representation).
I think the int type is at least two bytes, so the INT_MIN and INT_MAX may change in different platforms.
Fundamental types
≤climits≥ header
My money is on using memcpy. Any decent compiler knows to optimise it away:
#include <stdio.h>
#include <memory.h>
#include <limits.h>
static inline int unsigned_to_signed(unsigned n)
{
int result;
memcpy( &result, &n, sizeof(result));
return result;
}
int main(int argc, const char * argv[])
{
unsigned int x = UINT_MAX - 1;
int xx = unsigned_to_signed(x);
return xx;
}
For me (Xcode 8.3.2, Apple LLVM 8.1, -O3), that produces:
_main: ## #main
Lfunc_begin0:
.loc 1 21 0 ## /Users/Someone/main.c:21:0
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
##DEBUG_VALUE: main:argc <- %EDI
##DEBUG_VALUE: main:argv <- %RSI
Ltmp3:
##DEBUG_VALUE: main:x <- 2147483646
##DEBUG_VALUE: main:xx <- 2147483646
.loc 1 24 5 prologue_end ## /Users/Someone/main.c:24:5
movl $-2, %eax
popq %rbp
retq
Ltmp4:
Lfunc_end0:
.cfi_endproc

Safe way to negate a number of type std::size_t

When I want to negate a number of type std::size_t, I usually do -static_cast<int>(number). However, I understand that the number might not fit into an int. So, my question is what is a safe portable way to do this?
There is no safe portable way to do this.
size_t is an unsigned type. There is no guarantee that there is any signed integer type big enough to hold the maximum value of size_t.
If you're able to assume that the value you're negating isn't too big, you can convert it to long long (if your compiler supports it) or long (if it doesn't):
size_t s = some_value;
long long negative_s = -(long long)s;
If you're worried about overflow, you can compare the value of s to LLONG_MAX before doing the conversion.
-static_cast<int>(number) is safe; the result of the static_cast is implementation-defined if it would not fit in an int.
To detect if the result would not fit:
(number <= std::numeric_limits<int>::max()) ? -static_cast<int>(number) : ...
The safe way checks whether the variable fits into the corresponding signed type:
typedef std::size_t my_uint;
typedef typename std::make_signed<my_uint>::type my_int;
my_uint n = /* ... */;
if (n > std::numeric_limits<my_int>::max()) { /* Error! */ }
my_int m = -static_cast<my_int>(n);
You need to #include <limits> and <type_traits>.
(Or wrap everything into one line:)
if (n > std::numeric_limits<typename std::make_signed<decltype(x)>::type>::max()) { /* Error! */ }
I think you have an inherent problem in that you can't possibly negate the value in the upper-half range of a std::size_t using std::ssize_t, since a std::ssize_t can only describe half the values in the range of std::size_t. For instance, if you had a unsigned char value of 255, you could never get a signed char value of -255 ... you'd need a larger type, like a signed short. If std::size_t is the largest integral container of your platform, then you simply aren't going to be able to describe those values in a "negative" format without designating some custom data-type such as a struct with an extra flag variable for designating the sign of the value. That of course is no longer "portable"...