Calculate two's complement for raw bit sequence in C/C++

Calculate two's complement for raw bit sequence in C/C++ - c++

I want to decode a GPS navigation message where some parameters are marked such that:
Parameters so indicated shall be two's complement, with the sign bit
(+ or -) occupying the MSB
For example, I want to store a parameter af0 which has 22 number of bits, with bit 22 as the MSB.
The parameter af0 has been decoded by me and now I need to perform the two's complement operation. I stored af0 using an uint32_t integer type.
There are also other parameters like IDOT which has 14 number of bits and I stored it using an uint16_t.
I'm not sure, but if I understand it correctly if have to check the MSB for 1 or 0. If it is 1 I can
simply calculate the two's complement by negation (and casting) of the value, i.e. int32_t af0_i = -(int32_t)af0. If the MSB is 0 I just cast the value according: int32_t af0_i = (int32_t)af0.
Is this correct for uintX_t integer types? I also tried out: https://stackoverflow.com/a/34076866/6518689 but it didn't fixed my problem, the value remains the same.

af0_i = -(int32_t)af0 will not work as expected; it'll flip all the bits, whereas you need to sign-extend the MSB instead and keep the rest unchanged.
Let's assume you extracted the raw 22 bits into a 32-bit variable:
int32_t af0 = ... /* some 22-bit value, top 10 bits are 0 */;
So now bit 21 is the sign bit. But with int32_t the sign bit is bit 31 (technically two's complement isn't guaranteed until C++20).
So we can shift left by 10 bits and immediately back right, which will sign-extend it.
af0 <<= 10; af0 >>= 10;
The code above is guaranteed to sign-extend since C++20, and is implementation-defined before that (on x86 will work as expected, though you can add a static_assert for that).

Related

How typecast works on initialization "unsigned int i = -100"? [duplicate]

I was curious to know what would happen if I assign a negative value to an unsigned variable.
The code will look somewhat like this.
unsigned int nVal = 0;
nVal = -5;
It didn't give me any compiler error. When I ran the program the nVal was assigned a strange value! Could it be that some 2's complement value gets assigned to nVal?

For the official answer - Section 4.7 conv.integral
"If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
This essentially means that if the underlying architecture stores in a method that is not Two's Complement (like Signed Magnitude, or One's Complement), that the conversion to unsigned must behave as if it was Two's Complement.

It will assign the bit pattern representing -5 (in 2's complement) to the unsigned int. Which will be a large unsigned value. For 32 bit ints this will be 2^32 - 5 or 4294967291

You're right, the signed integer is stored in 2's complement form, and the unsigned integer is stored in the unsigned binary representation. C (and C++) doesn't distinguish between the two, so the value you end up with is simply the unsigned binary value of the 2's complement binary representation.

It will show as a positive integer of value of max unsigned integer - 4 (value depends on computer architecture and compiler).
BTW
You can check this by writing a simple C++ "hello world" type program and see for yourself

Yes, you're correct. The actual value assigned is something like all bits set except the third. -1 is all bits set (hex: 0xFFFFFFFF), -2 is all bits except the first and so on. What you would see is probably the hex value 0xFFFFFFFB which in decimal corresponds to 4294967291.

When you assign a negative value to an unsigned variable then it uses the 2's complement method to process it and in this method it flips all 0s to 1s and all 1s to 0s and then adds 1 to it. In your case, you are dealing with int which is of 4 byte(32 bits) so it tries to use 2's complement method on 32 bit number which causes the higher bit to flip. For example:
┌─[student#pc]─[~]
└──╼ $pcalc 0y00000000000000000000000000000101 # 5 in binary
5 0x5 0y101
┌─[student#pc]─[~]
└──╼ $pcalc 0y11111111111111111111111111111010 # flip all bits
4294967290 0xfffffffa 0y11111111111111111111111111111010
┌─[student#pc]─[~]
└──╼ $pcalc 0y11111111111111111111111111111010 + 1 # add 1 to that flipped binarry
4294967291 0xfffffffb 0y11111111111111111111111111111011

In Windows and Ubuntu Linux that I have checked assigning any negative number (not just -1) to an unsigned integer in C and C++ results in the assignment of the value UINT_MAX to that unsigned integer.
Compiled example link.

C++ and unsigned types

I'm reading the C++ Primer 5th Edition, and I don't understand the following part:
In an unsigned type, all the bits represent the value. For example, an 8-bit
unsigned char can hold the values from 0 through 255 inclusive.
What does it mean with "all the bits represent the value"?

You should compare this to a signed type. In a signed value, one bit (the top bit) is used to indicate whether the value is positive or negative, while the rest of the bits are used to hold the value.

The value of an object of trivially copyable type is determined by some bits in it, while other bits do not affect its value. In the C++ standard, the bits that do not affect the value are called padding bits.
For example, consider a type with 8 bits where the last 4 bits are padding bits, then the objects represented by 00000000 and 00001111 have the same value, and compare equal.
In reality, padding bits are often used for alignment and/or error detection.
Knowing the knowledge above, you can understand what the book is saying. It says there are no padding bits for an unsigned type. However, the statement is wrong. In fact, the standard only guarantees unsigned char (and signed char, char) has no padding bits. The following is a quote of related part of the standard [basic.fundamental]/1:
For narrow character types, all bits of the object representation participate in the value representation.
Also, the C11 standard 6.2.6.2/1 says
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter).

It means that all 8 bits represent an actual value, while in signed char only 7 bits represent actual value and 8-th bit (the most significant) represent sign of that value - positive or negative (+/-).

For example, one byte contains 8 bits, and all 8 bits are used to counting up from 0.
For unsigned, all bits zero = 00000000 means 0, 00000001 = 1, 00000010 = 2, 00000011 = 3, ... up to 11111111 = 255.
For a signed byte (or signed char), the leftmost bit means the sign, and therefore cannot be used to count. (I am optically separating the leftmost bit!) 0 0000001 = 1, but 1 0000001 = -1, 0 0000010 = 2, and 1 0000010 = -2, etc, up to 0 1111111 = 127, and 1 1111111 = -127. In this example, 1 0000000 would mean -0, which is useless/wasted, so it can mean for example 128.
There are other ways to code the bits into numbers, and some computers start from the left instead from the right. These details are hardware specific, and not relevant to understand 'unsigned', you only need to care about that when you want to mess in the code with the single bits (not recommended).

This is mostly a theoretical thing. On real hardware, the same holds for signed integers as well. Obviously, with signed integers, some of those values are negative.
Back to unsigned - what the text says is basically that the value of an unsigned number is simply 1<<0 + 1<<1 + 1<<2 + ... up to the total number of bits. Importantly, not only are all bits contributing, but all combinations of bits form a valid number. This is NOT the case for signed integers. Therefore, if you need a bitmask, it has to be an unsigned type of sufficient width, or you could run into invalid bit patterns.

Lower 25 bits of uint64_t

I am trying to extract the lower 25 bits of uint64_t to uint32_t. This solution shows how to extract lower 16 bits from uint32_t, but I am not able to figure out for uint64_t. Any help would be appreciated.

See How do you set, clear, and toggle a single bit? for bit operations.
To answer your question:
uint64_t lower25Bits = inputValue & (uint64_t)0x1FFFFFF;

Just mask with a mask that leaves just the bits you care about.
uint32_t out = input & ((1UL<<26)-1);
The idea here is: 1UL<<26 provides an (unsigned long, which is guaranteed to be at least 32-bit wide) integer with just the 26th bit set, i.e.
00000100000000000000000000000000
the -1 makes it become a value with all the bits below it set, i.e.:
00000011111111111111111111111111
the AND "lets through" only the bits that in the mask correspond to zero.
Another way is to throw away those bits with a double shift:
uint32_t out = (((uint32_t)input)<<7)>>7;
The cast to uint32_t makes sure we are dealing with a 32-bit wide unsigned integer; the unsigned part is important to get well-defined results with shifts (and bitwise operations in general), the 32 bit-wide part because we need a type with known size for this trick to work.
Let's say that (uint32_t)input is
11111111111111111111111111111111
we left shift it by 32-25=7; this throws away the top 7 bits
11111111111111111111111110000000
and we right-shift it back in place:
00000001111111111111111111111111
and there we go, we got just the bottom 25 bits.
Notice that the first uint32_t cast wouldn't be strictly necessary because you already have a known-size unsigned value; you could just do (input<<39)>>39, but (1) I prefer to be sure - what if tomorrow input becomes a type with another size/signedness? and (2) in general current CPUs are more efficient working with 32 bit integers than 64 bit integers.

Why does (unsigned int = -1) show the largest value that it can store? [duplicate]

In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.

C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).

It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.

Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.

Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)

That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.

Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.

What does ~0 mean in this code?

What's the meaning of ~0 in this code?
Can somebody analyze this code for me?
unsigned int Order(unsigned int maxPeriod = ~0) const
{
Point r = *this;
unsigned int n = 0;
while( r.x_ != 0 && r.y_ != 0 )
{
++n;
r += *this;
if ( n > maxPeriod ) break;
}
return n;
}

~0 is the bitwise complement of 0, which is a number with all bits filled. For an unsigned 32-bit int, that's 0xffffffff. The exact number of fs will depend on the size of the value that you assign ~0 to.

It's the one complement, which inverts all bits.
~ 0101 => 1010
~ 0000 => 1111
~ 1111 => 0000

As others have mentioned, the ~ operator performs bitwise complement. However, the result of performing the operation on a signed value is not defined by the standard.
In particular, the value of ~0 need not be -1, which is probably the value intended. Setting the default argument to
unsigned int maxPeriod = -1
would make maxPeriod contain the highest possible value (signed to unsigned conversion is defined as an assignment modulo 2**n, where n is a characteristic number of the given unsigned type (the number of bits of representation)).
Also note that default arguments are not valid in C.

It's a binary complement function.
Basically it means flip each bit.

It is the bitwise complement of 0 which would be, in this example, an int with all the bits set to 1. If sizeof(int) is 4, then the number is 0xffffffff.

Basically, it's saying that maxPeriod has a default value of UINT_MAX. Rather than writing it as UINT_MAX, the author used his knowledge of complements to calculate the value.
If you want to make the code a bit more readable in the future, include
#include <limits>
and change the call to read
unsigned int Order(unsigned int maxPeriod = UINT_MAX) const
Now to explain why ~0 is UINT_MAX. Since we are dealing with an int, in which 0 is represented with all zero bits (00000000). Adding one would give (00000001), adding one more would give (00000010), and one more would give (00000011). Finally one more addition would give (00000100) because the 1's carry.
For unsigned ints, if you repeat the process ad-infiniteum, eventually you have all one bits (11111111), and adding another one will overflow the buffer setting all the bits back to zero. This means that all one bits in an unsigned number is the maximum that data type (int in your case) can hold.
The "~" operation flips all bits from 0 to 1 or 1 to 0, flipping a zero integer (which has all zero bits) effectively gives you UINT_MAX. So he basically the previous coded opted to computer UINT_MAX instead of using the system defined copy located in #include <limits.h>

In the example it is probably an attempt to generate the UINT_MAX value. The technique is possibly flawed for reasons already stated.
The expression does however does have legitimate use to generate a bit mask with all bits set using a literal constant that is type-width independent; but that is not how it is being used in your example.

As others have said, ~ is the bitwise complement operator (sometimes also referred to as bitwise not). It's a unary operator which means that it takes a single input.
Bitwise operators treat the input as a bit pattern and perform their respective operations on each individual bit then return the resulting pattern. Applying the ~ operator to the bit pattern will negate each bit (each zero becomes a one, each one becomes a zero).
In the example you gave, the bit representation of the integer 0 is all zeros. Thus, ~0 will produce a bit pattern of all ones. Even though 0 is an int, it is the bit pattern ~0 that is assigned to maxPeriod (not the int value that would be represented by said bit pattern). Since maxPeriod is an unsigned int, it is assigned the unsigned int value represented by ~0 (a pattern of all ones), which is in fact the highest value that an unsigned int can store before wrapping around back to 0.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js