I am looking at some C++ code and I see:
byte b = someByteValue;
// take twos complement
byte TwosComplement = -b;
Is this code taking the twos complement of b? If not, What is it doing?
This code definitely does compute the twos-complement of an 8-bit binary number, on any implementation where stdint.h defines uint8_t:
#include <stdint.h>
uint8_t twos_complement(uint8_t val)
{
return -(unsigned int)val;
}
That is because, if uint8_t is available, it must be an unsigned type that is exactly 8 bits wide. The conversion to unsigned int is necessary because uint8_t is definitely narrower than int. Without the conversion, the value will be promoted to int before it is negated, so, if you're on a non-twos-complement machine, it will not take the twos-complement.
More generally, this code computes the twos-complement of a value with any unsigned type (using C++ constructs for illustration - the behavior of unary minus is the same in both languages, assuming no user-defined overloads):
#include <cstdint>
#include <type_traits>
template <typename T>
T twos_complement(T val,
// "allow this template to be instantiated only for unsigned types"
typename std::enable_if<std::is_unsigned<T>::value>::type* = 0)
{
return -std::uintmax_t(val);
}
because unary minus is defined to take the twos-complement when applied to unsigned types. We still need a cast to an unsigned type that is no narrower than int, but now we need it to be at least as wide as any possible T, hence uintmax_t.
However, unary minus does not necessarily compute the twos-complement of a value whose type is signed, because C (and C++) still explicitly allow implementations based on CPUs that don't use twos-complement for signed quantities. As far as I know, no such CPU has been manufactured in at least 20 years, so the continued provision for them is kind of silly, but there it is. If you want to compute the twos-complement of a value even if its type happens to be signed, you have to do this: (C++ again)
#include <type_traits>
template <typename T>
T twos_complement(T val)
{
typedef std::make_unsigned<T>::type U;
return T(-uintmax_t(U(val)));
}
i.e. convert to the corresponding unsigned type, then to uintmax_t, then apply unary minus, then back-convert to the possibly-signed type. (The cast to U is required to make sure the value is zero- rather than sign-extended from its natural width.)
(If you find yourself doing this, though, stop and change the types in question to unsigned instead. Your future self will thank you.)
The correct expression will look
byte TwosComplement = ~b + 1;
Note: provided that byte is defined as unsigned char
On a two's complement machine negation computes the two's complement, yes.
On the Unisys something-something, hopefully now dead and buried (but was still extant a few years ago), no for a signed type.
C and C++ supports two's complement, one's complement and sign-and-magnitude representation of signed integers, and only with two's complement does negation do a two's complement.
With byte as an unsigned type negation plus conversion to byte produces the two's complement bitpattern, regardless of integer representation, because conversion to unsigned as well as unsigned arithmetic is modulo 2n where n is the number of value representation bits.
That is, the resulting value after assigning or initializing with -x is 2n - x which is the two's complement of x.
This does not mean that the negation itself necessarily computes the two's complement bitpattern. To understand this, note that with byte defined as unsigned char, and with sizeof(int) > 1, the byte value is promoted to int before the negation, i.e. the negation operation is done with a signed type. But converting the resulting negative value to unsigned byte, creates the two's complement bitpattern by definition and the C++ guarantee of modulo arithmetic and conversion to unsigned type.
The usefulness of 2's complement form follows from 2n - x = 1 + ((2n - 1) - x), where the last parenthesis is an all-ones bitpattern minus x, i.e. a simple bitwise inversion of x.
twos_complement code for a byte binary number :
int byte[] = {1, 0, 1, 1, 1, 1, 1, 1};
if (byte[0] != 0){
for (int i = 0; i < 8; i++){
if (byte[i] == 1)
byte[i] = 0;
else
byte[i] = 1;
}
for (int j = 7; j >= 0; j--){
if (byte[j] == 0){
byte[j] = 1;
break;
}
else {
byte[j] = 0;
}
}
}
for (int i = 0; i < 8; i++)
cout << byte[i];
cout << endl;
Related
In C++, why is long l = 0x80000000; positive?
C++:
long l = 0x80000000; // l is positive. Why??
int i = 0x80000000;
long l = i; // l is negative
According to this site: https://en.cppreference.com/w/cpp/language/integer_literal, 0x80000000 should be a signed int but it doesn't appear to be case because when it gets assigned to l sign extension doesn't occur.
Java:
long l = 0x80000000; // l is negative
int i = 0x80000000;
long l = i; // l is negative
On the other hand, Java has a more consistent behavior.
C++ Test code:
#include <stdio.h>
#include <string.h>
void print_sign(long l) {
if (l < 0) {
printf("Negative\n");
} else if (l > 0) {
printf("Positive\n");
} else {
printf("Zero\n");
}
}
int main() {
long l = -0x80000000;
print_sign(l); // Positive
long l2 = 0x80000000;
print_sign(l2); // Positive
int i = 0x80000000;
long l3 = i;
print_sign(l3); // Negative
int i2 = -0x80000000;
long l4 = i2;
print_sign(l4); // Negative
}
From your link: "The type of the integer literal is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used." and for hexadecimal values lists int, unsigned int...
Your compiler uses 32 bit ints, so the largest (signed) int is 0x7FFFFFFF. The reason a signed int cannot represent 0x8000000...0xFFFFFFF is that it needs some of the 2^32 possible values of its 32 bits to represent negative numbers. However, 0x80000000 fits in an 32 bit unsigned int. Your compiler uses 64 bit longs, which can hold up to 0x7FFF FFFF FFFF FFFF, so 0x80000000 also fits in a signed long, and so the long l is the positive value 0x80000000.
On the other hand int i is a signed int and simply doesn't fit 0x80000000, so undefined behaviour occurs. What often happens when a signed number is too big to fit in C++ is that two-complement arithmetic is used and the number wraps round to a large negative number. (Do not rely on this behaviour; optimisations have been known to break this). In any case it appears the two's complement behaviour has indeed happened in this case, resulting in i being negative.
In your example code you use both 0x80000000 and -0x80000000 and in each case they have the same result. In fact, the are the same. Recall that 0x8000000 is an unsigned int. The 2003 C++ standard says in 5.3.1c7: "The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand." 0x80000000 is precisely 2^31, and so -0x80000000 is 2^32-2^31=2^31. To get the expected behaviours we would have to use -(long)0x80000000 instead.
With the help of the awesome people on SO, I think I can answer my own question now:
Just to correct the notion that 0x80000000 can't fit in an int:
It is possible to store, without loss or undefined behavior, the value 0x80000000 to an int (assuming sizeof(int) == 4). The following code can demonstrate this behavior:
#include <limits.h>
#include <stdio.h>
int main() {
int i = INT_MIN;
printf("%X\n", i);
return 0;
}
Assigning the literal 0x80000000 to a variable is little more nuanced, though.
What the other others failed to mention (except #Daniel Langr) is the fact that C++ doesn't have a concept of negative literals.
There are no negative integer literals. Expressions such as -1 apply the unary minus operator to the value represented by the literal, which may involve implicit type conversions.
With this in mind, the literal 0x80000000 is always treated as a positive number. Negations come after the size and sign have been determined. This is important: negations don't affect the unsigned/signedness of the literal, only the base and the value do. 0x80000000 is too big to fit in a signed integer, so C++ tries to use the next applicable type: unsigned int, which then succeeds. The order of types C++ tries depends on the base of the literal plus any suffixes it may or may not have.
The table is listed here: https://en.cppreference.com/w/cpp/language/integer_literal
So with this rule in mind let's work out some examples:
-2147483648: Treated as a long int because it can't fit in an int.
2147483648: Treated as a long int because C++ doesn't consider unsigned int as a candidate for decimal literals.
0x80000000: Treated as an unsigned int because C++ considers unsigned int as a candidate for non-decimal literals.
(-2147483647 - 1): Treated as an int. This is typically how INT_MIN is defined to preserve the type of the literal as an int. This is the type safe way of saying -2147483648 as an int.
-0x80000000: Treated as an unsigned int even though there's a negation. Negating any unsigned is undefined behavior, though.
-0x80000000l: Treated as a long int and the sign is properly negated.
According to the rules on implicit conversions between signed and unsigned integer types, discussed here and here, when summing an unsigned int with a int, the signed int is first converted to an unsigned int.
Consider, e.g., the following minimal program
#include <iostream>
int main()
{
unsigned int n = 2;
int x = -1;
std::cout << n + x << std::endl;
return 0;
}
The output of the program is, nevertheless, 1 as expected: x is converted first to an unsigned int, and the sum with n leads to an integer overflow, giving the "right" answer.
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
Yes.
First, the signed value converted to unsigned, using modulo arithmetic:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type).
Then two unsigned values will be added using modulo arithmetic:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
This means that you'll get the expected answer.
Even, if the result would be negative in the mathematical sense, the result in C++ would be a number which is modulo-equal to the negative number.
Note that I've supposed here that you add two same-sized integers.
I think you can be sure and it is not implementation defined, although this statement requires some interpretations of the standard when it comes to systems that do not use two's complement for representing negative values.
First, let's state the things that are clear: unsigned integrals do not overflow but take on a modulo 2^nrOfBits-value (cf this online C++ standard draft):
6.7.1 Fundamental types
(7) Unsigned integers shall obey the laws of arithmetic modulo 2n
where n is the number of bits in the value representation of that
particular size of integer.
So it's just a matter of whether a negative value nv is converted correctly into an unsigned integral bit pattern nv(conv) such that x + nv(conv) will always be the same as x - nv. For the case of a system using two's complement, things are clear, since the two's complement is actually designed such that this arithmetic works immediately.
For systems using other representations of negative values, we'll have to read the standard carefully:
7.8 Integral conversions
(2) If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two’s complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is
notruncation). —endnote]
As the footnote explicitly says, that in a two's complement representation, there is no change in the bit pattern, we may assume that in systems other than 2s complement a real conversion will take place such that x + nv(conv) == x - nv.
So due to 7.8 (2), I'd say that your assumption is valid.
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.
Consider a typical absolute value function (where for the sake of argument the integral type of maximum size is long):
unsigned long abs(long input);
A naive implementation of this might look something like:
unsigned long abs(long input)
{
if (input >= 0)
{
// input is positive
// We know this is safe, because the maximum positive signed
// integer is always less than the maximum positive unsigned one
return static_cast<unsigned long>(input);
}
else
{
return static_cast<unsigned long>(-input); // ut oh...
}
}
This code triggers undefined behavior, because the negation of input may overflow, and triggering signed integer overflow is undefined behavior. For instance, on 2s complement machines, the absolute value of std::numeric_limits<long>::min() will be 1 greater than std::numeric_limits<long>::max().
What can a library author do to work around this problem?
One can cast to the unsigned variant first to avoid any undefined behavior:
unsigned long uabs(long input)
{
if (input >= 0)
{
// input is positive
return static_cast<unsigned long>(input);
}
else
{
return -static_cast<unsigned long>(input); // read on...
}
}
In the above code, we invoke two well defined operations. Converting the signed integer to the unsigned one is well defined by N3485 4.7 [conv.integral]/2:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
This basically says that when making the specific conversion of going from signed to unsigned, one can assume unsigned-style wraparound.
The negation of the unsigned integer is well defined by 5.3.1 [expr.unary.op]/8:
The negative of an unsigned quantity is computed by subtracting its value from 2^n , where n is the number of bits in the promoted operand.
These two requirements effectively force implementations to operate like a 2s complement machine would, even if the underlying machine is a 1s complement or signed magnitude machine.
A generalized C++11 version that returns the unsigned version of an integral type:
#include <type_traits>
template <typename T>
constexpr
typename std::make_unsigned<T>::type uabs(T x)
{
typename std::make_unsigned<T>::type ux = x;
return (x<0) ? -ux : ux; // compare signed x, negate unsigned x
}
This compiles on the Godbolt compiler explorer, with a test case showing that gcc -O3 -fsanitize=undefined finds no UB in uabs(std::numeric_limits<long>::min()); after constant-propagation, but does in std::abs().
Further template stuff should be possible to make a version that would return the unsigned version of integral types, but return T for floating-point types, if you want a general-purpose replacement for std::abs.
Just add one if negative.
unsigned long absolute_value(long x) {
if (x >= 0) return (unsigned long)x;
x = -(x+1);
return (unsigned long)x + 1;
}
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.