confusing sizeof operator result - c++

I'm a bit confused about a sizeof result.
I have this :
unsigned long part1 = 0x0200000001;
cout << sizeof(part1); // this gives 8 byte
if I count correctly, part 1 is 9 byte long right ?
Can anybody clarify this for me
thanks
Regards

if I count correctly, part 1 is 9 byte long right ?
No, you are counting incorrectly. 0x0200000001 can fit into five bytes. One byte is represented by two hex digits. Hence, the bytes are 02 00 00 00 01.

I suppose you misinterpret the meaning of sizeof. sizeof(type) returns the number of bytes that the system reserves to hold any value of the respective type. So sizeof(int) on an 32 bit system will probably give you 4 bytes, and 8 bytes on a 64 bit system, sizeof(char[20]) gives 20, and so on.
Note that one can use also identifiers of (typed) variables, e.g. int x; sizeof(x); type is then deduced from the variable declaration/definition, such that sizeof(x) is the same as sizeof(int) in this case.
But: sizeof never ever interprets or analyses the content / the value of a variable at runtime, even if sizeof somehow sounds as if. So char *x = "Hello, world"; sizeof(x) is not the string length of string literal "Hello, world" but the size of type char*.
So your sizeof(part1) is the same as sizeof(long), which is 8 on your system regardless of what the actual content of part1 is at runtime.

An unsigned long has a minimum range of 0 to 4294967295 that's 4 bytes.
Assigning 0x0200000001 (8589934593) to an unsigned long that's not big enough will trigger a conversion so that it fits in an unsigned long on your machine. This conversion is implementation-defined but usually the higher bits will be discarded.
sizeof will tell you the amount of bytes a type uses. It won't tell you how many bytes are occupied by your value.

sizeof(part1) (I'm assuming you have a typo) gives the size of a unsigned long (i.e. sizeof(unsigned long)). The size of part1 is therefore the same regardless of what value is stored in it.
On your compiler, sizeof(unsigned long) has value of 8. The size of types is implementation defined for all types (other than char types that are defined to have a size of 1), so may vary between compilers.
The value of 9 you are expecting is the size of output you will obtain by writing the value of part1 to a file or string as human-readable hex, with no leading zeros or prefix. That has no relationship to the sizeof operator whatsoever. And, when outputting a value, it is possible to format it in different ways (e.g. hex versus decimal versus octal, leading zeros or not) which affect the size of output

sizeof(part1) returns the size of the data type of variable part1, which you have defined as unsigned long.
The bit-size of unsigned long for your compiler is always 64 bits, or 8 bytes long, that's 8 groups of 8 bits. The hexadecimal representation is a human readable form of the binary format, where each digit is 4 bits long. We humans often omit leading zeroes for clarity, computers never do.
Let's consider a single byte of data - a char - and the value zero.
- decimal: 0
- hexadecimal : 0x0 (often written as 0x00)
- binary: 0000 0000
For a list of C++ data types and their corresponding bit-size check out the documentation at (the msvc documentation is easier to read):
for msvc: https://msdn.microsoft.com/en-us/library/s3f49ktz(v=vs.71).aspx
for gcc: https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Data-Types
All compilers have documentation for their data sizes, since they depend on the hardware and the compiler itself. If you use a different compiler, a google search on "'your compiler name' data sizes" will help you find the correct sizes for your compiler.

Related

Why subtract from 256 when assigning signed char to unsigned char in C++?

in Bjarne's "The C++ Programming Language" book, the following piece of code on chars is given:
signed char sc = -140;
unsigned char uc = sc;
cout << uc // prints 't'
1Q) chars are 1byte (8 bits) in my hardware. what is the binary representation of -140? is it possible to represent -140 using 8 bits. I would think range is guaranteed to be at least [-127...127] when signed chars is considered. how is it even possible to represent -140 in 8 bits?
2Q) assume it's possible. why do we subtract 140 from uc when sc is assigned to uc? what is the logic behind this?
EDIT: I've wrote cout << sizeof (signed char) and it's produced 1 (1 byte). I put this to be exact on the byte-wise size of signed char.
EDIT 2: cout << int {sc } gives the output 116. I don't understand what happens here?
First of all: Unless you're writing something very low-level that requires bit-representation manipulation - avoid writing this kind of code like the plague. It's hard to read, easy to get wrong, confusing, and often exhibits implementation-defined/undefined behavior.
To answer your question though:
The code assumed you're on a platform in which the types signed char and unsigned char have 8 bits (although theoretically they could have more). And that the hardware has "two's complement" behavior: The bit representation of the result of an arithmetic operation on an integer type with N bits is always modulo 2^N. That also specifies how the same bit-pattern is interpreted as signed or unsigned. Now, -140 modulo 2^8 is 116 (01110100), so that's the bit pattern sc will hold. Interpreted as a signed char (-128 through 127), this is still 116.
An unsigned char can represent 116 as well, so the second assignment results in 116 as well.
116 is the ASCII code of the character t; and std::cout interprets unsigned char values (under 128) as ASCII codes. So, that's what gets printed.
The result of assigning -140 to a signed char is implementation-defined, just like its range is (i.e. see the manual). A very common choice is to use wrap-around math: if it doesn't fit, add or subtract 256 (or the relevant max range) until it fits.
Since sc will have the value 116, and uc can also hold that value, that conversion is trivial. The unusual thing already happened when we assigned -140 to sc.

Confused by unsigned short behavior in C++

I am currently attempting to learn C++. I usually learn by playing around with stuff, and since I was reading up on data types, and then ways you could declare the value of an integer (decimal,binary,hex etc), I decided to test how "unsigned short"s worked. I am now confused.
Here is my code:
#include <cstdio>
int main(){
unsigned short a = 0b0101010101010101;
unsigned short b = 0b01010101010101011;
unsigned short c = 0b010101010101010101;
printf("%hu\n%hu\n%hu\n", a, b, c);
}
Integers of type "unsigned short" should have a size of 2 bytes across all operating systems.
I have used binary to declare the values of these integers, because this is the easiest way to make the source of my confusion obvious.
Integer "a" has 16 digits in binary. 16 digits in a data type with a size of 16 bits (2 bytes). When I print it, I get the number 21845. Seems okay. Checks out.
Then it gets weird.
Integer "b" has 17 digits. When we print it, we get the decimal version of the whole 17 digit number, 43691. How does a binary number that takes up 17 digits fit into a variable that should only have 16 bits of memory allocated to it? Is someone lying about the size? Is this some sort of compiler magic?
And then it gets even weirder. Integer "c" has 18 digits, but here we hit the upper limit. When we build, we get the following error:
/home/dimitrije/workarea/c++/helloworld.cpp: In function ‘int main()’:
/home/dimitrije/workarea/c++/helloworld.cpp:6:22: warning: large integer implicitly truncated to unsigned type [-Woverflow]
unsigned short c = 0b010101010101010101;
Okay, so we can put 17 digits in 16 bits, but we can't put in 18. Makes some kind of sense I guess? Like we can magic away 1 digit but two wont work. But the supposed "truncation", rather than truncating to the actual maximum value, 17 digits (or 43691 in this example), truncates to what the limit logically should be, 21845.
This is frying my brain and I'm too far into the rabbit whole to stop now. Does anyone understand why C++ behaves this way?
---EDIT---
So after someone pointed out to me that my binary numbers started with a 0, I realized I was stupid.
However, when I took the 0 from the left hand side and carried it right (meaning that a,b c were actually 16,17,18 bits respectively), I realized that the truncating behaviour still doesn't make sense. Here is the output:
43690
21846
43690
43960 is the maximum value for 16 bits. I could've checked this before asking the original question and saved myself some time.
Why does 17 bits truncate to 15, and 18 (and also 19,20,21) truncate to 16?
--EDIT 2---
I've changed all the digits in my integers to 1, and my mistake makes sense now. I get back 65535. I took the time to type 2^16 into a calculator this time. The entirety of my question was a result of the fact that I didn't properly look at the binary value I was assigning.
Thanks to the guy who linked implicit conversions, I will read up on that.
On most systems a unsigned short is 16 bits. No matter what you assign to a unsigned short it will be truncated to 16 bits. In your example the first bit is a 0 which is essentially being ignored, in the same way int x = 05; will just equal 5 and not 05.
If you change the first bit from a 0 to a 1, you will see the expected behaviour of the assignment truncating the value to 16 bits.
The range for an unsigned short int (16 bits) is 0 to 65535
65535 = 1111 1111 1111 1111 in binary

Tricky interview question for mid-level C++ developer

I was asked this question on the interview, and I can't really understand what is going on here. The question is "What would be displayed in the console?"
#include <iostream>
int main()
{
unsigned long long n = 0;
((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;
n >>= 7*8;
std::cout << n;
}
What is happening here, step by step?
Let's get this one step at a time:
((char*)&n)
This casts the address of the variable n from unsigned long long* to char*. This is legal and actually accessing objects of different types via pointer of char is one of the very few "type punning" cases accepted by the language. This in effect allows you to access the memory of the object n as an array of bytes (aka char in C++)
((char*)&n)[sizeof(unsigned long long)-1]
You access the last byte of the object n. Remember sizeof returns the dimension of a data type in bytes (in C++ char has an alter ego of byte)
((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;
You set the last byte of n to the value 0xFF.
Since n was 0 initially the layout memory of n is now:
00 .. 00 FF
Now notice the ... I put in the middle. That's not because I am lazy to copy paste the values the amount of bytes n has, it's because the size of unsigned long long is not set by the standard to a fixed dimension. There are some restrictions, but it can vary from implementation to implementation. So this is the first "unknown". However on most modern architectures sizeof (unsigned long long) is 8, so we are going to go with this, but in a serious interview you are expected to mention this.
The other "unknown" is how these bytes are interpreted. Unsigned integers are simply encoded in binary. But it can be little endian or big endian. x86 is little endian so we are going with it for the exemplification. And again, in a serious interview you are expected to mention this.
n >>= 7*8;
This right shifts the value of n 56 times. Pay attention, now we are talking about the value of n, not the bytes in memory. With our assumptions (size 8, little endian) the value encoded in memory is 0xFF000000 00000000 so shifting it 7*8 times will result in the value 0xFF which is 255.
So, assuming sizeof(unsigned long long) is 8 and a little endian encoding the program prints 255 to the console.
If we are talking about a big endian system, the memory layout after setting the last byte to 0xff is still the same: 00 ... 00 FF, but now the value encoded is 0xFF. So the result of n >>= 7*8; would be 0. In a big endian system the program would print 0 to the console.
As pointed out in the comments, there are other assumptions:
char being 8 bits. Although sizeof(char) is guaranteed to be 1, it doesn't have to have 8 bits. All modern systems I know of have bits grouped in 8-bit bytes.
integers don't have to be little or big endian. There can be other arrangement patterns like middle endian. Being something other than little or big endian is considered esoteric nowadays.
Cast the address of n to a pointer to chars, set the 7th (assuming sizeof(long long)==8) char element to 0xff, then right-shift the result (as a long long) by 56 bits.

how 256 stored in char variable and unsigned char

Up to 255, I can understand how the integers are stored in char and unsigned char ;
#include<stdio.h>
int main()
{
unsigned char a = 256;
printf("%d\n",a);
return(0);
}
In the code above I have an output of 0 for unsigned char as well as char.
For 256 I think this is the way the integer stored in the code (this is just a guess):
First 256 converted to binary representation which is 100000000 (totally 9 bits).
Then they remove the remove the leftmost bit (the bit which is set) because the char datatype only have 8 bits of memory.
So its storing in the memory as 00000000 , that's why its printing 0 as output.
Is the guess correct or any other explanation is there?
Your guess is correct. Conversion to an unsigned type uses modular arithmetic: if the value is out of range (either too large, or negative) then it is reduced modulo 2N, where N is the number of bits in the target type. So, if (as is often the case) char has 8 bits, the value is reduced modulo 256, so that 256 becomes zero.
Note that there is no such rule for conversion to a signed type - out-of-range values give implementation-defined results. Also note that char is not specified to have exactly 8 bits, and can be larger on less mainstream platforms.
On your platform (as well as on any other "normal" platform) unsigned char is 8 bit wide, so it can hold numbers from 0 to 255.
Trying to assign 256 (which is an int literal) to it results in an unsigned integer overflow, that is defined by the standard to result in "wraparound". The result of u = n where u is an unsigned integral type and n is an unsigned integer outside its range is u = n % (max_value_of_u +1).
This is just a convoluted way to say what you already said: the standard guarantees that in these cases the assignment is performed keeping only the bits that fit in the target variable. This norm is there since most platform already implement this at the assembly language level (unsigned integer overflow typically results in this behavior plus some kind of overflow flag set to 1).
Notice that all this do not hold for signed integers (as often plain char is): signed integer overflow is undefined behavior.
yes, that's correct. 8 bits can hold 0 to 255 unsigned, or -128 to 127 signed. Above that and you've hit an overflow situation and bits will be lost.
Does the compiler give you warning on the above code? You might be able to increase the warning level and see something. It won't warn you if you assign a variable that can't be determined statically (before execution), but in this case it's pretty clear you're assigning something too large for the size of the variable.

Size of byte (clarification)

I'm writing a game server, and this might be an easy question, but I just want some clarification.
Why is it that a byte (char or unsigned char) can hold up to a value of 255 (0xFF, which I believe is 2 bytes)? When I use sizeof(unsigned char) the compiler tells me it is 1 byte.
Is it because (in ACSII) it is getting "converted" to a character?
Sorry for this poor explaination, I'm not really good at describing a question.
This touches on a bunch of subjects, including the historical meaning of a byte, the C definition of a char, and mathematics.
For starters, a byte has historically been a lot of things, but nowadays we nearly always mean an octet, which is 8 bits. As a play on words, there's also the nybble (or often nibble) which is half a byte (not called bite).
Mathematics tells us that with an ordered combination of 8 1-or-0 values, we get 28 = 256 combinations. Sometimes we use this unsigned, sometimes signed, but either way we want to have 0 in the range; so the unsigned range is 0..255. For the signed range, we have more options, of which two's complement is the most popular; in that case, we get one more negative value than positive, for a range of -128..+127.
C++ inherits char from C, where it is defined to have a sizeof of 1, to be the smallest addressable size (i.e. having distinct address values with &), and a minimal range of -128..127 or 0..255 depending on if it's signed or not. That boils down to requiring at least 8 bits, or one byte; exactly one byte if the machine supports it.
0xff is another way of writing 255. 0x is the C way of marking a hexadecimal constant, so each digit in it is 4 bits (for 16 possible digits), ergo the nibble. This translates to an unsigned octet with all bits set to 1.
If specific size matters to your code, there is a header stdint.h that defines types of minimal and exact sizes, for speed or size optimization.
Incidentally, ASCII is a 7-bit character set. Machines with 7-bit bytes are unusual nowadays, and wider character sets like ISO 8859-1 and UTF-8 are popular.
0xFF can be stored in 8 bits, which is one byte.
sizeof(char) is defined to always return 1, regardless of the actual size in bits of the underlying datatype (see 5.3.3.1 of the current standard). The sizes of all other dataypes are calculated relative to the size of a char.
When I use sizeof(unsigned char) the compiler tells me it is 1 byte.
The size of char [whether it is signed or unsigned ] is always 1 as mandated by the C++ Standard.
char size is always 1 but number of bits can differ, C define macro CHAR_BIT that have number of bits in char.
This mean maximum value that unsigned char can have is pow(2, CHAR_BIT) - 1.
More info there: What is CHAR_BIT?
Sizeof char or unsigned char is 1 Byte as per the standard.
Why different ranges if same size?
1 Byte = 8 bits or 2^8
2^8 = 256
Hence,
signed char range is from -128 to 127
unsigned char range is from 0 to 255
This is because in case of signed char one of the bits is used to store the sign, while since unsigned char cannot be -ve, that bit is utlized to increase the range.
255, 0xFF is one byte when represented as an unsigned char. You cannot represent 255 as a signed char.
1 byte is 8 bits so in case of
signed : (1 bit is used for sign so 2^7 = 128) it holds from -128 to 127
unsigned : (2^8 = 255) it holds from 0 to 255