C++ and unsigned types

C++ and unsigned types - c++

I'm reading the C++ Primer 5th Edition, and I don't understand the following part:
In an unsigned type, all the bits represent the value. For example, an 8-bit
unsigned char can hold the values from 0 through 255 inclusive.
What does it mean with "all the bits represent the value"?

You should compare this to a signed type. In a signed value, one bit (the top bit) is used to indicate whether the value is positive or negative, while the rest of the bits are used to hold the value.

The value of an object of trivially copyable type is determined by some bits in it, while other bits do not affect its value. In the C++ standard, the bits that do not affect the value are called padding bits.
For example, consider a type with 8 bits where the last 4 bits are padding bits, then the objects represented by 00000000 and 00001111 have the same value, and compare equal.
In reality, padding bits are often used for alignment and/or error detection.
Knowing the knowledge above, you can understand what the book is saying. It says there are no padding bits for an unsigned type. However, the statement is wrong. In fact, the standard only guarantees unsigned char (and signed char, char) has no padding bits. The following is a quote of related part of the standard [basic.fundamental]/1:
For narrow character types, all bits of the object representation participate in the value representation.
Also, the C11 standard 6.2.6.2/1 says
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter).

It means that all 8 bits represent an actual value, while in signed char only 7 bits represent actual value and 8-th bit (the most significant) represent sign of that value - positive or negative (+/-).

For example, one byte contains 8 bits, and all 8 bits are used to counting up from 0.
For unsigned, all bits zero = 00000000 means 0, 00000001 = 1, 00000010 = 2, 00000011 = 3, ... up to 11111111 = 255.
For a signed byte (or signed char), the leftmost bit means the sign, and therefore cannot be used to count. (I am optically separating the leftmost bit!) 0 0000001 = 1, but 1 0000001 = -1, 0 0000010 = 2, and 1 0000010 = -2, etc, up to 0 1111111 = 127, and 1 1111111 = -127. In this example, 1 0000000 would mean -0, which is useless/wasted, so it can mean for example 128.
There are other ways to code the bits into numbers, and some computers start from the left instead from the right. These details are hardware specific, and not relevant to understand 'unsigned', you only need to care about that when you want to mess in the code with the single bits (not recommended).

This is mostly a theoretical thing. On real hardware, the same holds for signed integers as well. Obviously, with signed integers, some of those values are negative.
Back to unsigned - what the text says is basically that the value of an unsigned number is simply 1<<0 + 1<<1 + 1<<2 + ... up to the total number of bits. Importantly, not only are all bits contributing, but all combinations of bits form a valid number. This is NOT the case for signed integers. Therefore, if you need a bitmask, it has to be an unsigned type of sufficient width, or you could run into invalid bit patterns.

Related

How typecast works on initialization "unsigned int i = -100"? [duplicate]

I was curious to know what would happen if I assign a negative value to an unsigned variable.
The code will look somewhat like this.
unsigned int nVal = 0;
nVal = -5;
It didn't give me any compiler error. When I ran the program the nVal was assigned a strange value! Could it be that some 2's complement value gets assigned to nVal?

For the official answer - Section 4.7 conv.integral
"If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
This essentially means that if the underlying architecture stores in a method that is not Two's Complement (like Signed Magnitude, or One's Complement), that the conversion to unsigned must behave as if it was Two's Complement.

It will assign the bit pattern representing -5 (in 2's complement) to the unsigned int. Which will be a large unsigned value. For 32 bit ints this will be 2^32 - 5 or 4294967291

You're right, the signed integer is stored in 2's complement form, and the unsigned integer is stored in the unsigned binary representation. C (and C++) doesn't distinguish between the two, so the value you end up with is simply the unsigned binary value of the 2's complement binary representation.

It will show as a positive integer of value of max unsigned integer - 4 (value depends on computer architecture and compiler).
BTW
You can check this by writing a simple C++ "hello world" type program and see for yourself

Yes, you're correct. The actual value assigned is something like all bits set except the third. -1 is all bits set (hex: 0xFFFFFFFF), -2 is all bits except the first and so on. What you would see is probably the hex value 0xFFFFFFFB which in decimal corresponds to 4294967291.

When you assign a negative value to an unsigned variable then it uses the 2's complement method to process it and in this method it flips all 0s to 1s and all 1s to 0s and then adds 1 to it. In your case, you are dealing with int which is of 4 byte(32 bits) so it tries to use 2's complement method on 32 bit number which causes the higher bit to flip. For example:
┌─[student#pc]─[~]
└──╼ $pcalc 0y00000000000000000000000000000101 # 5 in binary
5 0x5 0y101
┌─[student#pc]─[~]
└──╼ $pcalc 0y11111111111111111111111111111010 # flip all bits
4294967290 0xfffffffa 0y11111111111111111111111111111010
┌─[student#pc]─[~]
└──╼ $pcalc 0y11111111111111111111111111111010 + 1 # add 1 to that flipped binarry
4294967291 0xfffffffb 0y11111111111111111111111111111011

In Windows and Ubuntu Linux that I have checked assigning any negative number (not just -1) to an unsigned integer in C and C++ results in the assignment of the value UINT_MAX to that unsigned integer.
Compiled example link.

Why does (unsigned int = -1) show the largest value that it can store? [duplicate]

In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.

C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).

It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.

Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.

Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)

That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.

Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.

how 256 stored in char variable and unsigned char

Up to 255, I can understand how the integers are stored in char and unsigned char ;
#include<stdio.h>
int main()
{
unsigned char a = 256;
printf("%d\n",a);
return(0);
}
In the code above I have an output of 0 for unsigned char as well as char.
For 256 I think this is the way the integer stored in the code (this is just a guess):
First 256 converted to binary representation which is 100000000 (totally 9 bits).
Then they remove the remove the leftmost bit (the bit which is set) because the char datatype only have 8 bits of memory.
So its storing in the memory as 00000000 , that's why its printing 0 as output.
Is the guess correct or any other explanation is there?

Your guess is correct. Conversion to an unsigned type uses modular arithmetic: if the value is out of range (either too large, or negative) then it is reduced modulo 2N, where N is the number of bits in the target type. So, if (as is often the case) char has 8 bits, the value is reduced modulo 256, so that 256 becomes zero.
Note that there is no such rule for conversion to a signed type - out-of-range values give implementation-defined results. Also note that char is not specified to have exactly 8 bits, and can be larger on less mainstream platforms.

On your platform (as well as on any other "normal" platform) unsigned char is 8 bit wide, so it can hold numbers from 0 to 255.
Trying to assign 256 (which is an int literal) to it results in an unsigned integer overflow, that is defined by the standard to result in "wraparound". The result of u = n where u is an unsigned integral type and n is an unsigned integer outside its range is u = n % (max_value_of_u +1).
This is just a convoluted way to say what you already said: the standard guarantees that in these cases the assignment is performed keeping only the bits that fit in the target variable. This norm is there since most platform already implement this at the assembly language level (unsigned integer overflow typically results in this behavior plus some kind of overflow flag set to 1).
Notice that all this do not hold for signed integers (as often plain char is): signed integer overflow is undefined behavior.

yes, that's correct. 8 bits can hold 0 to 255 unsigned, or -128 to 127 signed. Above that and you've hit an overflow situation and bits will be lost.
Does the compiler give you warning on the above code? You might be able to increase the warning level and see something. It won't warn you if you assign a variable that can't be determined statically (before execution), but in this case it's pretty clear you're assigning something too large for the size of the variable.

Size of byte (clarification)

I'm writing a game server, and this might be an easy question, but I just want some clarification.
Why is it that a byte (char or unsigned char) can hold up to a value of 255 (0xFF, which I believe is 2 bytes)? When I use sizeof(unsigned char) the compiler tells me it is 1 byte.
Is it because (in ACSII) it is getting "converted" to a character?
Sorry for this poor explaination, I'm not really good at describing a question.

This touches on a bunch of subjects, including the historical meaning of a byte, the C definition of a char, and mathematics.
For starters, a byte has historically been a lot of things, but nowadays we nearly always mean an octet, which is 8 bits. As a play on words, there's also the nybble (or often nibble) which is half a byte (not called bite).
Mathematics tells us that with an ordered combination of 8 1-or-0 values, we get 28 = 256 combinations. Sometimes we use this unsigned, sometimes signed, but either way we want to have 0 in the range; so the unsigned range is 0..255. For the signed range, we have more options, of which two's complement is the most popular; in that case, we get one more negative value than positive, for a range of -128..+127.
C++ inherits char from C, where it is defined to have a sizeof of 1, to be the smallest addressable size (i.e. having distinct address values with &), and a minimal range of -128..127 or 0..255 depending on if it's signed or not. That boils down to requiring at least 8 bits, or one byte; exactly one byte if the machine supports it.
0xff is another way of writing 255. 0x is the C way of marking a hexadecimal constant, so each digit in it is 4 bits (for 16 possible digits), ergo the nibble. This translates to an unsigned octet with all bits set to 1.
If specific size matters to your code, there is a header stdint.h that defines types of minimal and exact sizes, for speed or size optimization.
Incidentally, ASCII is a 7-bit character set. Machines with 7-bit bytes are unusual nowadays, and wider character sets like ISO 8859-1 and UTF-8 are popular.

0xFF can be stored in 8 bits, which is one byte.
sizeof(char) is defined to always return 1, regardless of the actual size in bits of the underlying datatype (see 5.3.3.1 of the current standard). The sizes of all other dataypes are calculated relative to the size of a char.

When I use sizeof(unsigned char) the compiler tells me it is 1 byte.
The size of char [whether it is signed or unsigned ] is always 1 as mandated by the C++ Standard.

char size is always 1 but number of bits can differ, C define macro CHAR_BIT that have number of bits in char.
This mean maximum value that unsigned char can have is pow(2, CHAR_BIT) - 1.
More info there: What is CHAR_BIT?

Sizeof char or unsigned char is 1 Byte as per the standard.
Why different ranges if same size?
1 Byte = 8 bits or 2^8
2^8 = 256
Hence,
signed char range is from -128 to 127
unsigned char range is from 0 to 255
This is because in case of signed char one of the bits is used to store the sign, while since unsigned char cannot be -ve, that bit is utlized to increase the range.

255, 0xFF is one byte when represented as an unsigned char. You cannot represent 255 as a signed char.

1 byte is 8 bits so in case of
signed : (1 bit is used for sign so 2^7 = 128) it holds from -128 to 127
unsigned : (2^8 = 255) it holds from 0 to 255

Why unsigned int 0xFFFFFFFF is equal to int -1?

In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.

It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.

Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.

Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)

That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.

Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js