Tricky interview question for mid-level C++ developer

Tricky interview question for mid-level C++ developer - c++

I was asked this question on the interview, and I can't really understand what is going on here. The question is "What would be displayed in the console?"
#include <iostream>
int main()
{
unsigned long long n = 0;
((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;
n >>= 7*8;
std::cout << n;
}
What is happening here, step by step?

Let's get this one step at a time:
((char*)&n)
This casts the address of the variable n from unsigned long long* to char*. This is legal and actually accessing objects of different types via pointer of char is one of the very few "type punning" cases accepted by the language. This in effect allows you to access the memory of the object n as an array of bytes (aka char in C++)
((char*)&n)[sizeof(unsigned long long)-1]
You access the last byte of the object n. Remember sizeof returns the dimension of a data type in bytes (in C++ char has an alter ego of byte)
((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;
You set the last byte of n to the value 0xFF.
Since n was 0 initially the layout memory of n is now:
00 .. 00 FF
Now notice the ... I put in the middle. That's not because I am lazy to copy paste the values the amount of bytes n has, it's because the size of unsigned long long is not set by the standard to a fixed dimension. There are some restrictions, but it can vary from implementation to implementation. So this is the first "unknown". However on most modern architectures sizeof (unsigned long long) is 8, so we are going to go with this, but in a serious interview you are expected to mention this.
The other "unknown" is how these bytes are interpreted. Unsigned integers are simply encoded in binary. But it can be little endian or big endian. x86 is little endian so we are going with it for the exemplification. And again, in a serious interview you are expected to mention this.
n >>= 7*8;
This right shifts the value of n 56 times. Pay attention, now we are talking about the value of n, not the bytes in memory. With our assumptions (size 8, little endian) the value encoded in memory is 0xFF000000 00000000 so shifting it 7*8 times will result in the value 0xFF which is 255.
So, assuming sizeof(unsigned long long) is 8 and a little endian encoding the program prints 255 to the console.
If we are talking about a big endian system, the memory layout after setting the last byte to 0xff is still the same: 00 ... 00 FF, but now the value encoded is 0xFF. So the result of n >>= 7*8; would be 0. In a big endian system the program would print 0 to the console.
As pointed out in the comments, there are other assumptions:
char being 8 bits. Although sizeof(char) is guaranteed to be 1, it doesn't have to have 8 bits. All modern systems I know of have bits grouped in 8-bit bytes.
integers don't have to be little or big endian. There can be other arrangement patterns like middle endian. Being something other than little or big endian is considered esoteric nowadays.

Cast the address of n to a pointer to chars, set the 7th (assuming sizeof(long long)==8) char element to 0xff, then right-shift the result (as a long long) by 56 bits.

Related

Why subtract from 256 when assigning signed char to unsigned char in C++?

in Bjarne's "The C++ Programming Language" book, the following piece of code on chars is given:
signed char sc = -140;
unsigned char uc = sc;
cout << uc // prints 't'
1Q) chars are 1byte (8 bits) in my hardware. what is the binary representation of -140? is it possible to represent -140 using 8 bits. I would think range is guaranteed to be at least [-127...127] when signed chars is considered. how is it even possible to represent -140 in 8 bits?
2Q) assume it's possible. why do we subtract 140 from uc when sc is assigned to uc? what is the logic behind this?
EDIT: I've wrote cout << sizeof (signed char) and it's produced 1 (1 byte). I put this to be exact on the byte-wise size of signed char.
EDIT 2: cout << int {sc } gives the output 116. I don't understand what happens here?

First of all: Unless you're writing something very low-level that requires bit-representation manipulation - avoid writing this kind of code like the plague. It's hard to read, easy to get wrong, confusing, and often exhibits implementation-defined/undefined behavior.
To answer your question though:
The code assumed you're on a platform in which the types signed char and unsigned char have 8 bits (although theoretically they could have more). And that the hardware has "two's complement" behavior: The bit representation of the result of an arithmetic operation on an integer type with N bits is always modulo 2^N. That also specifies how the same bit-pattern is interpreted as signed or unsigned. Now, -140 modulo 2^8 is 116 (01110100), so that's the bit pattern sc will hold. Interpreted as a signed char (-128 through 127), this is still 116.
An unsigned char can represent 116 as well, so the second assignment results in 116 as well.
116 is the ASCII code of the character t; and std::cout interprets unsigned char values (under 128) as ASCII codes. So, that's what gets printed.

The result of assigning -140 to a signed char is implementation-defined, just like its range is (i.e. see the manual). A very common choice is to use wrap-around math: if it doesn't fit, add or subtract 256 (or the relevant max range) until it fits.
Since sc will have the value 116, and uc can also hold that value, that conversion is trivial. The unusual thing already happened when we assigned -140 to sc.

confusing sizeof operator result

I'm a bit confused about a sizeof result.
I have this :
unsigned long part1 = 0x0200000001;
cout << sizeof(part1); // this gives 8 byte
if I count correctly, part 1 is 9 byte long right ?
Can anybody clarify this for me
thanks
Regards

if I count correctly, part 1 is 9 byte long right ?
No, you are counting incorrectly. 0x0200000001 can fit into five bytes. One byte is represented by two hex digits. Hence, the bytes are 02 00 00 00 01.

I suppose you misinterpret the meaning of sizeof. sizeof(type) returns the number of bytes that the system reserves to hold any value of the respective type. So sizeof(int) on an 32 bit system will probably give you 4 bytes, and 8 bytes on a 64 bit system, sizeof(char[20]) gives 20, and so on.
Note that one can use also identifiers of (typed) variables, e.g. int x; sizeof(x); type is then deduced from the variable declaration/definition, such that sizeof(x) is the same as sizeof(int) in this case.
But: sizeof never ever interprets or analyses the content / the value of a variable at runtime, even if sizeof somehow sounds as if. So char *x = "Hello, world"; sizeof(x) is not the string length of string literal "Hello, world" but the size of type char*.
So your sizeof(part1) is the same as sizeof(long), which is 8 on your system regardless of what the actual content of part1 is at runtime.

An unsigned long has a minimum range of 0 to 4294967295 that's 4 bytes.
Assigning 0x0200000001 (8589934593) to an unsigned long that's not big enough will trigger a conversion so that it fits in an unsigned long on your machine. This conversion is implementation-defined but usually the higher bits will be discarded.
sizeof will tell you the amount of bytes a type uses. It won't tell you how many bytes are occupied by your value.

sizeof(part1) (I'm assuming you have a typo) gives the size of a unsigned long (i.e. sizeof(unsigned long)). The size of part1 is therefore the same regardless of what value is stored in it.
On your compiler, sizeof(unsigned long) has value of 8. The size of types is implementation defined for all types (other than char types that are defined to have a size of 1), so may vary between compilers.
The value of 9 you are expecting is the size of output you will obtain by writing the value of part1 to a file or string as human-readable hex, with no leading zeros or prefix. That has no relationship to the sizeof operator whatsoever. And, when outputting a value, it is possible to format it in different ways (e.g. hex versus decimal versus octal, leading zeros or not) which affect the size of output

sizeof(part1) returns the size of the data type of variable part1, which you have defined as unsigned long.
The bit-size of unsigned long for your compiler is always 64 bits, or 8 bytes long, that's 8 groups of 8 bits. The hexadecimal representation is a human readable form of the binary format, where each digit is 4 bits long. We humans often omit leading zeroes for clarity, computers never do.
Let's consider a single byte of data - a char - and the value zero.
- decimal: 0
- hexadecimal : 0x0 (often written as 0x00)
- binary: 0000 0000
For a list of C++ data types and their corresponding bit-size check out the documentation at (the msvc documentation is easier to read):
for msvc: https://msdn.microsoft.com/en-us/library/s3f49ktz(v=vs.71).aspx
for gcc: https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Data-Types
All compilers have documentation for their data sizes, since they depend on the hardware and the compiler itself. If you use a different compiler, a google search on "'your compiler name' data sizes" will help you find the correct sizes for your compiler.

Typecasting char to long

Say I have a variable, a
char a = 0x01;
and I want to cast this to a long, as in
long b;
b = (long)a;
Will the upper 3 bytes in b be guaranteed to be 0? With my setup they are 0, but I'm not sure if this is compiler-dependent.

Yes, b is guaranteed to have the value 0x1 after this assignment even without the cast. The assignment operator in c++ is generally semantic or value driven, it will copy the value or state, rather than preform bit wise copy (even if the two are sometimes equivalent, such as for trivial types).
In some cases, specially because of operator overloading, this may not be the case. Developers are very strongly encouraged to keep to this concept when they design new types, but a careless programmer could overload the assignment operator for non-fundamental types to do anything he/she wants.

As a long can represent all values for a char (be it signed or unsigned) the conversion is guaranteed to not change the value.
If you initially have a positive value, because either char is signed in you architecture or because the char values is between 0 and 127 (assuming 8 bit characters), the resulting long is guaranteed to be positive and less that 256. So in an architecture where long is 4 bytes large, the 3 high order bytes are guaranteed to be 0.
If char is signed and if the initial value is negative, things will be different! The value will be unchanged and will still be negative. In a common 2'complement architecture, the 3 high order bits will be 0xFF

The answer already given is right, but I thought I'd add that for C++, it is recommended to use one of the C++-specific casting notations, to make it abundantly clear what you are doing. Here, you would use:
long b;
b = static_cast<long>(a);
This makes it very clear what you are doing (a cast whereby how the cast is performed is calculated at compile time to a long), and you know that the "right" sort of cast will be performed.

char a = 0x01;
long b;
b = (long)a;
C and C++ are two different (but closely related) languages. Their rules happen to be the same in this case.
The cast (not "typecast") is not necessary. The assignment could, and probably should, be written as:
b = a;
which causes an implicit conversion from char to long. Since the value being converted is within the representable range of type long, the result of the conversion is 1. The result of the conversion is specified in terms of values, not representations.
The representation of the value 1 in type long probably has a 1 in the low-order bit, and 0s in all the other bits. (And the position of the low-order bit can vary; some systems are big-endian, some are little-endian, and there are other possibilities.)
There is no guarantee that type long even has three high-order bytes. Type long is at least 32 bits wide, but a byte can be wider than 8 bits. It's even possible that there are values of type char that exceed LONG_MAX (if plain char is signed and long is 1 byte, which implies CHAR_BIT >= 32).
It's also possible that the representation of type long includes padding bits, bits that do not contribute to the value. It's guaranteed that the sign bit is 0, the low-order value bit is 1, and all other value bits are 0, but if there are padding bits their values are not guaranteed. (Some combinations of padding bits can result in a trap representation that does not represent any value, but that can't happen in this particular case.)
Most of these exotic possibilities are very unlikely to occur in real life. C implementations for some DSPs do have bytes wider than 8 bits, but any system you're using almost certainly has 8-bit bytes.
The point is that the result of the conversion is defined in terms of values, not representations, and 99% of the time that's all you need to care about. If you write:
char a = 1; /* same as 0x01 */
long b = a;
printf("b = %ld\n", b);
it will print b = 1, even if you're using some exotic system where the value 1 is represented strangely.

b will be 1; this is always, compiler and endianness-independent, true. Additionally, the following expressions will be true:
b == 1
b == 01
b == 0x1
b == 0x00000001
b == 0x00000000000000000000000000000000000000000000000000001
The right hand side in all cases is an int constant with the value 1; not more, not less. Note that the zeroes do not represent bytes in memory (an int most likely does not have the number of bytes the last expression appears to suggest). The hexadecimal notation is just another way to write down a 1, exactly like 1.
In particular, we don't know where in memory the byte with the value 1 is located, because that is architecture dependent. It may be the one at the address of the int, or it may be the other end, or even in between.
Now comes the sweet thing: C does not care how the memory in an int is laid out. None of the ways to write an integer constant is architecture dependent. That seems self-understood with decimal constants — did we expect that the meaning of int i = 1 is architecture dependent? Certainly not. Nor is int i = 0x00000001;. The same is true for the bit shift operators: << shifts towards more significant bits, >> towards less significant bits. The digits in (decimal or hexadecimal) integer constants are ordered so that the most significant digits are on the left side, aligning with the "direction" indicated by the arrow-like bit shift operators. That may or may not reflect your machine's int representation; on a PC it does not.
Bottom line: If you use the standard C (or C++) means to test the "upper 3 bytes", you are home free, and the following is always true, independent of the implementation or architecture:
char a = 0x01;
long b = a;
(b & 0x11) == 1 // least significant byte is 1
(b & 0x00000011) == 1 // exactly the same as above
(b & 0x11111100) == 0 // more significant three bytes are all 0
It's possible that your long has more bits, but that is implementation dependent. How many more there are: they are all zero, save for the least significant one.

How to pack data in binary format in c++

Say, i have binary protocol, where first 4 bits represent a numeric value which can be less than or equal to 10 (ten in decimal).
In C++, the smallest data type available to me is char, which is 8 bits long. So, within my application, i can hold the value represented by 4 bits in a char variable. My question is, if i have to pack the char value back into 4 bits for network transmission, how do i pack my char's value back into 4 bits?

You do bitwise operation on the char;
so
unsigned char packedvalue = 0;
packedvalue |= 0xF0 & (7 <<4);
packedvalue |= 0x0F & (10);
Set the 4 upper most bit to 7 and the lower 4 bits to 10
Unpacking these again as
int upper, lower;
upper = (packedvalue & 0xF0) >>4;
lower = packedvalue & 0x0F;
As an extra answer to the question -- you may also want to look at protocol buffers for a way of encoding and decoding data for binary transfers.

Sure, just use one char for your value:
std::ofstream outfile("thefile.bin", std::ios::binary);
unsigned int n; // at most 10!
char c = n << 4; // fits
outfile.write(&c, 1); // we wrote the value "10"
The lower 4 bits will be left at zero. If they're also used for something, you'll have to populate c fully before writing it. To read:
infile.read(&c, 1);
unsigned int n = c >> 4;

Well, there's the popular but non-portable "Bit Fields". They're standard-compliant, but may create a different packing order on different platforms. So don't use them.
Then, there are the highly portable bit shifting and bitwise AND and OR operators, which you should prefer. Essentially, you work on a larger field (usually 32 bits, for TCP/IP protocols) and extract or replace subsequences of bits. See Martin's link and Soren's answer for those.

Are you familiar with C's bitfields? You simply write
struct my_bits {
unsigned v1 : 4;
...
};
Be warned, various operations are slower on bitfields because the compiler must unpack them for things like addition. I'd imagine unpacking a bitfield will still be faster than the addition operation itself, even though it requires multiple instructions, but it's still overhead. Bitwise operations should remain quite fast. Equality too.
You must also take care with endianness and threads (see the wikipedia article I linked for details, but the issues are kinda obvious). You should leearn about endianness anyways since you said "binary protocol" (see this previous questions)

Size of byte (clarification)

I'm writing a game server, and this might be an easy question, but I just want some clarification.
Why is it that a byte (char or unsigned char) can hold up to a value of 255 (0xFF, which I believe is 2 bytes)? When I use sizeof(unsigned char) the compiler tells me it is 1 byte.
Is it because (in ACSII) it is getting "converted" to a character?
Sorry for this poor explaination, I'm not really good at describing a question.

This touches on a bunch of subjects, including the historical meaning of a byte, the C definition of a char, and mathematics.
For starters, a byte has historically been a lot of things, but nowadays we nearly always mean an octet, which is 8 bits. As a play on words, there's also the nybble (or often nibble) which is half a byte (not called bite).
Mathematics tells us that with an ordered combination of 8 1-or-0 values, we get 28 = 256 combinations. Sometimes we use this unsigned, sometimes signed, but either way we want to have 0 in the range; so the unsigned range is 0..255. For the signed range, we have more options, of which two's complement is the most popular; in that case, we get one more negative value than positive, for a range of -128..+127.
C++ inherits char from C, where it is defined to have a sizeof of 1, to be the smallest addressable size (i.e. having distinct address values with &), and a minimal range of -128..127 or 0..255 depending on if it's signed or not. That boils down to requiring at least 8 bits, or one byte; exactly one byte if the machine supports it.
0xff is another way of writing 255. 0x is the C way of marking a hexadecimal constant, so each digit in it is 4 bits (for 16 possible digits), ergo the nibble. This translates to an unsigned octet with all bits set to 1.
If specific size matters to your code, there is a header stdint.h that defines types of minimal and exact sizes, for speed or size optimization.
Incidentally, ASCII is a 7-bit character set. Machines with 7-bit bytes are unusual nowadays, and wider character sets like ISO 8859-1 and UTF-8 are popular.

0xFF can be stored in 8 bits, which is one byte.
sizeof(char) is defined to always return 1, regardless of the actual size in bits of the underlying datatype (see 5.3.3.1 of the current standard). The sizes of all other dataypes are calculated relative to the size of a char.

When I use sizeof(unsigned char) the compiler tells me it is 1 byte.
The size of char [whether it is signed or unsigned ] is always 1 as mandated by the C++ Standard.

char size is always 1 but number of bits can differ, C define macro CHAR_BIT that have number of bits in char.
This mean maximum value that unsigned char can have is pow(2, CHAR_BIT) - 1.
More info there: What is CHAR_BIT?

Sizeof char or unsigned char is 1 Byte as per the standard.
Why different ranges if same size?
1 Byte = 8 bits or 2^8
2^8 = 256
Hence,
signed char range is from -128 to 127
unsigned char range is from 0 to 255
This is because in case of signed char one of the bits is used to store the sign, while since unsigned char cannot be -ve, that bit is utlized to increase the range.

255, 0xFF is one byte when represented as an unsigned char. You cannot represent 255 as a signed char.

1 byte is 8 bits so in case of
signed : (1 bit is used for sign so 2^7 = 128) it holds from -128 to 127
unsigned : (2^8 = 255) it holds from 0 to 255

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js