What is wrong with this bit-manipulation code from an interview question? - c++

I was having a look over this page: http://www.devbistro.com/tech-interview-questions/Cplusplus.jsp, and didn't understand this question:
What’s potentially wrong with the following code?
long value;
//some stuff
value &= 0xFFFF;
Note: Hint to the candidate about the base platform they’re developing for. If the person still doesn’t find anything wrong with the code, they are not experienced with C++.
Can someone elaborate on it?
Thanks!

Several answers here state that if an int has a width of 16 bits, 0xFFFF is negative. This is not true. 0xFFFF is never negative.
A hexadecimal literal is represented by the first of the following types that is large enough to contain it: int, unsigned int, long, and unsigned long.
If int has a width of 16 bits, then 0xFFFF is larger than the maximum value representable by an int. Thus, 0xFFFF is of type unsigned int, which is guaranteed to be large enough to represent 0xFFFF.
When the usual arithmetic conversions are performed for evaluation of the &, the unsigned int is converted to a long. The conversion of a 16-bit unsigned int to long is well-defined because every value representable by a 16-bit unsigned int is also representable by a 32-bit long.
There's no sign extension needed because the initial type is not signed, and the result of using 0xFFFF is the same as the result of using 0xFFFFL.
Alternatively, if int is wider than 16 bits, then 0xFFFF is of type int. It is a signed, but positive, number. In this case both operands are signed, and long has the greater conversion rank, so the int is again promoted to long by the usual arithmetic conversions.
As others have said, you should avoid performing bitwise operations on signed operands because the numeric result is dependent upon how signedness is represented.
Aside from that, there's nothing particularly wrong with this code. I would argue that it's a style concern that value is not initialized when it is declared, but that's probably a nit-pick level comment and depends upon the contents of the //some stuff section that was omitted.
It's probably also preferable to use a fixed-width integer type (like uint32_t) instead of long for greater portability, but really that too depends on the code you are writing and what your basic assumptions are.

I think depending on the size of a long the 0xffff literal (-1) could be promoted to a larger size and being a signed value it will be sign extended, potentially becoming 0xffffffff (still -1).

I'll assume it's because there's no predefined size for a long, other than it must be at least as big as the preceding size (int). Thus, depending on the size, you might either truncate value to a subset of bits (if long is more than 32 bits) or overflow (if it's less than 32 bits).
Yeah, longs (per the spec, and thanks for the reminder in the comments) must be able to hold at least -2147483647 to 2147483647 (LONG_MIN and LONG_MAX).

For one value isn't initialized before doing the and so I think the behaviour is undefined, value could be anything.

long type size is platform/compiler specific.
What you can here say is:
It is signed.
We can't know the result of value &= 0xFFFF; since it could be for example value &= 0x0000FFFF; and will not do what expected.

While one could argue that since it's not a buffer-overflow or some other error that's likely to be exploitable, it's a style thing and not a bug, I'm 99% confident that the answer that the question-writer is looking for is that value is operated on before it's assigned to. The value is going to be arbitrary garbage, and that's unlikely to be what was meant, so it's "potentially wrong".

Using MSVC I think that the statement would perform what was most likely intended - that is: clear all but the least significant 16 bits of value, but I have encountered other platforms which would interpret the literal 0xffff as equivalent to (short)-1, then sign extend to convert to long, in which case the statement "value &= 0xFFFF" would have no effect.
"value &= 0x0FFFF" is more explicit and robust.

Related

Bit shifting with leading 1

When I use the >> bitwise operator on 1000 in c++ it gives this result: 1100. I want the result to be 0100. When the 1 is in any other position this is exactly what happens, but with a leading 1 it goes wrong. Why is that and how can it be avoided?
The behavior you describe is coherent with what happens on some platforms when right-shifting a signed integer with the high bit set (so, negative values).
In this case, on many platforms compilers will emit code to perform an arithmetic shift, which propagates the sign bit; this, on platforms with 2's complement representation for negative integers (= virtually every current platform) has the effect of giving the "x >> i = floor(x/2i)" behavior even on negative values. Notice that this is not contractual - as far as the C++ standard is concerned, shifting negative integers in implementation-defined behavior, so any compiler is free to implement different semantics for it1.
To come to your question, to obtain the "regular" shift behavior (generally called "logical shift") you have to make sure to work on unsigned integers. This can be obtained either making sure that the variable you are shifting is of unsigned type (e.g. unsigned int) or, if it's a literal, by putting an U suffix to it (e.g. 1 is an int, 1U is an unsigned int).
If the data you have is of a signed type (e.g. int) you may cast it to the corresponding unsigned type before shifting without risks (conversion from a signed int to an unsigned one is well-defined by the standard, and doesn't change the bit values on 2's complement machines).
Historically, this comes from the fact that C strove to support even machines that didn't have "cheap" arithmetic shift functionality at hardware level and/or didn't use 2's complement representation.
As mentioned by others, when right shifting on a signed int, it is implementation defined whether you will get 1s or 0s. In your case, because the left most bit in 1000 is a 1, the "replacement bits" are also 1. Assuming you must work with signed ints, in order to get rid of it, you can apply a bitmask.

Is it safe to take the difference of two size_t objects?

I'm investigating a standard for my team around using size_t vs int (or long, etc). The biggest drawback I've seen pointed out is that taking the difference of two size_t objects can cause problems (I'm unsure of specific problems -- maybe something wasn't 2s complemented and the signed/unsigned angers the compiler). I wrote a quick program in C++ using the V120 VS2013 compiler that allowed me to do the following:
#include <iostream>
main()
{
size_t a = 10;
size_t b = 100;
int result = a - b;
}
The program resulted in -90, which although correct, makes me nervous about type mismatches, signed/unsigned problems, or just plain undefined behavior if the size_t happens to get used in complex math.
My question is if it's safe to do math with size_t objects, specifically, taking the difference? I'm considering using size_t as a standard for things like indexes. I've seen some interesting posts on the topic here, but they don't address the math issue (or I missed it).
What type for subtracting 2 size_t's?
typedef for a signed type that can contain a size_t?
This is not guaranteed to work portably, but is not UB either. The code must run without error, but the resulting int value is implementation defined. So as long as you are working on platforms that guarantee the desired behavior, this is fine (as long as the difference can be represented by an int of course), otherwise, just use signed types everywhere (see last paragraph).
Subtracting two std::size_ts will yield a new std::size_t† and its value will be determined by wrapping. In your example, assuming 64 bit size_t, a - b will equal 18446744073709551526. This does not fit into an (commonly used 32 bit) int, so an implementation defined value is assigned to result.
To be honest, I would recommend to not use unsigned integers for anything but bit magic. Several members of the standard committee agree with me: https://channel9.msdn.com/Events/GoingNative/2013/Interactive-Panel-Ask-Us-Anything 9:50, 42:40, 1:02:50
Rule of thumb (paraphrasing Chandler Carruth from the above video): If you could count it yourself, use int, otherwise use std::int64_t.
†Unless its conversion rank is less than int, e.g. if std::size_t is unsigned short. In that case, the result is an int and everything will work fine (unless int is not wider than short). However
I do not know of any platform that does this.
This would still be platform specific, see first paragraph.
The size_t type is unsigned. The subtraction of any two size_t values is defined-behavior
However, firstly, the result is implementation-defined if a larger value is subtracted from a smaller one. The result is the mathematical value, reduced to the smallest positive residue modulo SIZE_T_MAX + 1. For instance if the largest value of size_t is 65535, and the result of subtracting two size_t values is -3, then the result will be 65536 - 3 = 65533. On a different compiler or machine with a different size_t, the numeric value will be different.
Secondly, a size_t value might be out of range of the type int. If that is the case, we get a second implementation-defined result arising from the forced conversion. In this situation, any behavior can apply; it just has to be documented by the implementation, and the conversion must not fail. For instance, the result could be clamped into the int range, producing INT_MAX. A common behavior seen on two's complement machines (virtually all) in the conversion of wider (or equal width) unsigned types to narrower signed types is simple bit truncation: enough bits are taken from the unsigned value to fill the signed value, including its sign bit.
Because of the way two's complement works, if the original arithmetically correct abstract result itself fits into int, then the conversion will produce that result.
For instance, suppose that the subtraction of a pair of 64 bit size_t values on a two's complement machine yields the abstract arithmetic value -3, which is becomes the positive value 0xFFFFFFFFFFFFFFFD. When this is coerced into a 32 bit int, then the common behavior seen in many compilers for two's complement machines is that the lower 32 bits are taken as the image of the resulting int: 0xFFFFFFFD. And, of course, that is just the value -3 in 32 bits.
So the upshot is, that your code is de facto quite portable because virtually all mainstream machines are two's complement with conversion rules based on sign extension and bit truncation, including between signed and unsigned.
Except that sign extension doesn't occur when a value is widened while converting from unsigned to signed. Thus he one problem is the rare situation in which int is wider than size_t. If a 16 bit size_t result is 65533, due to 4 being subtracted from 1, this will not produce a -3 when converted to a 32 bit int; it will produce 65533!
If you don't use size_t, you are screwed: size_t is the one type that exists to be used for memory sizes, and which is consequently guaranteed to always be big enough for that purpose. (uintptr_t is quite similar, but it's neither the first such type, nor is it used by the standard libraries, nor is it available without including stdint.h.) If you use an int, you can get undefined behavior when your allocations exceed 2GiB of address space (or 32kiB if you are on a platform where int is only 16 bits!), even though the machine has more memory and you are executing in 64 bit mode.
If you need a difference of size_t that may become negative, use the signed variant ssize_t.

C++: Strange math results

The following calculation results in "1" for me.
unsigned long iEndFrame = ((4294966336 + 1920 - 1) / 1920) + 1;
Does anybody see why? I thought that unsigned long could handle this.
Thank you very much for the help.
The values on the right of your calculation have type unsigned int or int.
4294966336 + 1920 = 4294968256
assuming sizeof(unsigned int) == 4, this overflows, leaving you with 960
(960-1)/1920 = 0
(due to rounding in integer arithmetic). This leaves you with
0 + 1 = 1
If you want to perform the calculation using a larger type (and assuming that sizeof(unsigned long) > sizeof(unsigned int)), you need a cast inside the calculation
unsigned long iEndFrame = (((unsigned long)4294966336 + 1920 - 1) / 1920) + 1;
or, as noted by Slava, set one of the literal values to have unsigned long type using the UL suffix
unsigned long iEndFrame = ((4294966336UL + 1920 - 1) / 1920) + 1;
"I thought that unsigned long could handle this." - I'm not sure what you mean by this. There's nothing in the expression that you evaluate that would suggest that it should use unsigned long. The fact that you eventually initialize an unsigned long variable with it has no effect on the process of evaluation. The initializing expression is handled completely independently from that.
Now, in C++ language an unsuffixed decimal integral literal always has signed type. The compiler is not allowed to choose unsigned types for any of the literals used in your expression. The compiler is required to use either int, long int or long long int (the last one - in C++11), depending on the value of the literal. The compiler is allowed to use implementation-dependent extended integral types, but only if they are signed. And if all signed types are too small, the behavior of your program is undefined.
If we assume that we are working with typical real-life platforms with integer types having the "traditional" 16, 32 or 64 bits in width, then there are only two possibilities here
If on your platform all signed integer types are too small to represent 4294966336, then your program has undefined behavior. End of story.
If at least one signed integer type is large enough for 4294966336, there should be no evaluation overflow and your expression should evaluate to 2236963.
So, the only real language-level explanation for that 1 result is that you are running into undefined behavior, because all signed types are too small to represent the literals you used in your expression.
If on your platform some signed integer type is actually sufficiently large to represent 4294966336 (i.e. some type has at least 64 bits), then the result can only be explained by the fact the your compiler is broken.
P.S. Note that the possibility of unsigned int being used for the evaluation only existed in old C language - C89/90. That would produce the third explanation for the result you obtained. But, again, that explanation would only apply to C89/90 compilers, not to C++ compilers or C99 compilers. And your question is tagged [C++].
I meant to leave this as a comment, but do not have enough reputation. I wanted to share this anyway, because I think that there are two components to tmighty's question and one of them might not be answered quite correctly, from what I understand.
First, you need to be explicit about the data type you are using, i.e. use a suffix such as UL or explicit type conversion. Other answers have made this clear enough, I think.
Second, you need to then pick a data type that is large enough.
You have already stated "I thought that unsigned long could handle this" and other answers seem to confirm this, but from what I know - and double-checking just now to try to make sure - unsigned long may not be large enough. It is guaranteed to be at least 4294967295, which would be too small for your use case. Specifically, I have just checked under Windows using VC++ and both compilation as 32 bit and 64 bit define ULONG_MAX to be 4294967295.
What you may need to do instead is use the "ULL" suffix and use unsigned long long, since its maximum seems to be at least 18446744073709551615. Even if there are platforms that define unsigned long to be large enough for your purpose, I think you might want to use a data type that is actually guaranteed to be large enough.

does in c++ the conversion from unsigned int to int always preserve the bit pattern?

From the standard (4.7) it looks like the conversion from int to unsigned int, when they both use the same number of bits, is purely conceptual:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2 n where n
is the number of bits used to represent the unsigned type). [ Note: In
a two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation). —
end note ]
So in this direction the conversion preserves the bitmask. I am not sure the standard guarantees the same for the conversion from unsigned int to int (again, assuming the same number of bits are used). The standard here says:
If the destination type is signed, the value is unchanged if it can be
represented in the destination type (and bit-field width); otherwise,
the value is implementation-defined.
What does it exactly mean "the destination type" here? For instance 2^32-1 cannot be represented by a 32 bit int. Does that mean that it cannot be represented in the destination type and therefore it cannot be assumed that the bit pattern will stay the same?
You cannot assume anything.
The first quote doesn't state that the bitmask remains the same. It may be the same in two's complement, but not in one's complement or other representations.
Second, implementation-defined means implementation-defined, you can't assume anything in general.
In theory, the representation can be completely different after each conversion. That's it.
If you look at it in a realistic way things come more concrete. Usually, int's are stored in two's complement and signed->unsigned preserves the pattern as unsigned->signed does (since the value can be implementation-defined, the cheapest way is doing nothing).
int is the destination type in this case. As you say 2^32-1 cannot be represented so in this case so it is implementation-specific. Although, I've only ever seen it preserve bit patterns.
EDIT: I should add that in the embedded world often whats done when one storage location needs multiple representations that are bit-for-bit identical we often use unions.
so in this case
union FOO {
int32_t signedVal;
uint32_t unsignedVal;
} var;
var can be accessed as var.signedVal to get the 32 bits stored as a signed int and var.unsignedVal to get the 32 bits stored as an unsigned value. In this case bits will be preserved.
"Destination type" refers to the type you're assigning/casting to.
The whole paragraph means a 32 bit unsigned int converted to a 32 bit signed int will stay as-is, given the value fits into the signed int. If they don't fit, it depends on the implementation on what it does (e.g. truncate). That means it really depends on the implementation whether the bits stay or whether they're changed (there's no guarantee).
Or in other words: If the unsigned int uses its most significant bit, the result is no longer predictable. Otherwise there's no change (other than changing the "name on the box").

What do the C and C++ standards say about bit-level integer representation and manipulation?

I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:
If all the bits in an integer type are zero, does the integer as whole represent zero?
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Is there a guaranteed way to check if any bit is not set?
Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?
I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").
Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.
(1) If all the bits in an integer type are zero, does the integer as whole represent zero?
Yes, the bit pattern consisting of all zeroes always represents 0:
The representations of integral types shall define values by use of a pure binary numeration system.49 [§3.9.1/7]
49 A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. In fact, signed magnitude is specifically allowed:
[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end
example ] [§3.9.1/7]
(3) Is there a guaranteed way to check if any bit is not set?
I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]
(4) Is there a guaranteed way to check if any bit is set?
I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.
(5) Is there a guaranteed way to set the left-most and/or right-most bits?
Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:
Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]
You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.
The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.
As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.
I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.
You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.
If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.
Good luck finding a machine nowadays that uses signed magnitude or one's complement.
If you want your brain to explode, consider this: If you interpret an int or long or long long as an array of unsigned char (which is the most reasonable thing to do if you want to see all the bits), you know that the order of bytes is not defined, for example "bigendian" vs. "littleendian". We all (hopefully) know that.
But it is worse: Each bit of an int could be stored in any of the bits of the array of char. So there are 32! ways how the bits of a 32 bit integer could be mapped to an array of four 8-bit unsigned chars by a truly bizarre implementation. Fortunately, I haven't encountered more than two ways myself (and I know of one more ordering in a real computer).
If all the bits in an integer type are zero, does the integer as whole represent zero?
Edit: since you have now clarified that you are not concerned with the padding bits, the answer to this is actually "yes". But I leave the original:
Not necessarily, it could be a trap representation. See C99 6.2.6.1:
For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)
The presence of padding bits allows for the possibility that all 0 is a trap representation. (As noted by Keith Thompson in the comment below, the more recent C11 makes explicit that such a representation is not a trap representation).
and
The values of any padding bits are unspecified
and
44) Some combinations of padding bits might generate trap representations
If you restrict the question to value and sign bits, the answer is yes, due to 6.2.6.2:
If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2 N −1 , so that objects of that type shall be capable of representing values from 0 to 2 N − 1 using a pure binary representation; this shall be known as the value representation.
and
If the sign bit is zero, it shall not affect the resulting value.
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Not necessarily, and in fact sign-and-magnitude is explicitly supported in 6.2.6.2.
Is there a guaranteed way to check if any bit is not set?
If you do not care about padding and sign bits, you could just compare to 0, but this would not work with a 1's complement representation (which is allowed) seeing as all bits 0 and all bits 1 both represent the value 0.
Otherwise: you can read the value of each byte via an unsigned char *, and compare the result to 0:
Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation
If you want to check a specific value bit, you could construct a suitable bitmask using (1u << n), but this will not necessarily let you inspect the sign bit.
Is there a guaranteed way to check if any bit is set?
The answer is essentially the same as to the previous question.
Is there a guaranteed way to set the left-most and/or right-most bits?
Do you mean left-most value bit? You could count the bits in INT_MAX or UINT_MAX or equivalent depending on the type, and use that to construct a value (via 1 << n) with which to OR the original value.
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
You can do so using a bitmask which you left shift repeatedly, but you can check only the value bits this way and not the sign bit.
For the bitmanipulations you could make a struct with 8 one unsigned bit fields and let the pointer of that struct point to your char. In that way you can easily access each bit. But the compiler will probably do masking under the hood, so it is only a cleaner way for the programmer I think. You must check that your compiler doesn't change the order of the fields when doing this.
yourstruct* pChar=(yourstruct*)(&c)
pChar.Bit7=1;
Let me caveat this by saying I'm addressing C and C++ in general (e.g. C90 and lower, MS Visual C++, etc): the "greatest common denominator" (vs. the latest/greatest cx11 "standard").
Q: If all the bits in an integer type are zero, does the integer as whole represent zero?
A: Yes
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
A: Yes. This includes the sign bit, for a signed int.
I'm frankly not familiar with "magnitude"
Q: Is there a guaranteed way to check if any bit is not set?
A: "And'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to check if any bit is set?
A: Again, "and'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to set the left-most and/or right-most bits?
A: I believe you should always have a "MAX_INT" available for all implementations/all architectures to determine the leftmost bit.
I'm prepared to be flamed ... but I believe the above is accurate. And I hope it helps.
IMHO...