Bit shifting with leading 1 - c++

When I use the >> bitwise operator on 1000 in c++ it gives this result: 1100. I want the result to be 0100. When the 1 is in any other position this is exactly what happens, but with a leading 1 it goes wrong. Why is that and how can it be avoided?

The behavior you describe is coherent with what happens on some platforms when right-shifting a signed integer with the high bit set (so, negative values).
In this case, on many platforms compilers will emit code to perform an arithmetic shift, which propagates the sign bit; this, on platforms with 2's complement representation for negative integers (= virtually every current platform) has the effect of giving the "x >> i = floor(x/2i)" behavior even on negative values. Notice that this is not contractual - as far as the C++ standard is concerned, shifting negative integers in implementation-defined behavior, so any compiler is free to implement different semantics for it1.
To come to your question, to obtain the "regular" shift behavior (generally called "logical shift") you have to make sure to work on unsigned integers. This can be obtained either making sure that the variable you are shifting is of unsigned type (e.g. unsigned int) or, if it's a literal, by putting an U suffix to it (e.g. 1 is an int, 1U is an unsigned int).
If the data you have is of a signed type (e.g. int) you may cast it to the corresponding unsigned type before shifting without risks (conversion from a signed int to an unsigned one is well-defined by the standard, and doesn't change the bit values on 2's complement machines).
Historically, this comes from the fact that C strove to support even machines that didn't have "cheap" arithmetic shift functionality at hardware level and/or didn't use 2's complement representation.

As mentioned by others, when right shifting on a signed int, it is implementation defined whether you will get 1s or 0s. In your case, because the left most bit in 1000 is a 1, the "replacement bits" are also 1. Assuming you must work with signed ints, in order to get rid of it, you can apply a bitmask.

Related

Is it safe to take the difference of two size_t objects?

I'm investigating a standard for my team around using size_t vs int (or long, etc). The biggest drawback I've seen pointed out is that taking the difference of two size_t objects can cause problems (I'm unsure of specific problems -- maybe something wasn't 2s complemented and the signed/unsigned angers the compiler). I wrote a quick program in C++ using the V120 VS2013 compiler that allowed me to do the following:
#include <iostream>
main()
{
size_t a = 10;
size_t b = 100;
int result = a - b;
}
The program resulted in -90, which although correct, makes me nervous about type mismatches, signed/unsigned problems, or just plain undefined behavior if the size_t happens to get used in complex math.
My question is if it's safe to do math with size_t objects, specifically, taking the difference? I'm considering using size_t as a standard for things like indexes. I've seen some interesting posts on the topic here, but they don't address the math issue (or I missed it).
What type for subtracting 2 size_t's?
typedef for a signed type that can contain a size_t?
This is not guaranteed to work portably, but is not UB either. The code must run without error, but the resulting int value is implementation defined. So as long as you are working on platforms that guarantee the desired behavior, this is fine (as long as the difference can be represented by an int of course), otherwise, just use signed types everywhere (see last paragraph).
Subtracting two std::size_ts will yield a new std::size_t† and its value will be determined by wrapping. In your example, assuming 64 bit size_t, a - b will equal 18446744073709551526. This does not fit into an (commonly used 32 bit) int, so an implementation defined value is assigned to result.
To be honest, I would recommend to not use unsigned integers for anything but bit magic. Several members of the standard committee agree with me: https://channel9.msdn.com/Events/GoingNative/2013/Interactive-Panel-Ask-Us-Anything 9:50, 42:40, 1:02:50
Rule of thumb (paraphrasing Chandler Carruth from the above video): If you could count it yourself, use int, otherwise use std::int64_t.
†Unless its conversion rank is less than int, e.g. if std::size_t is unsigned short. In that case, the result is an int and everything will work fine (unless int is not wider than short). However
I do not know of any platform that does this.
This would still be platform specific, see first paragraph.
The size_t type is unsigned. The subtraction of any two size_t values is defined-behavior
However, firstly, the result is implementation-defined if a larger value is subtracted from a smaller one. The result is the mathematical value, reduced to the smallest positive residue modulo SIZE_T_MAX + 1. For instance if the largest value of size_t is 65535, and the result of subtracting two size_t values is -3, then the result will be 65536 - 3 = 65533. On a different compiler or machine with a different size_t, the numeric value will be different.
Secondly, a size_t value might be out of range of the type int. If that is the case, we get a second implementation-defined result arising from the forced conversion. In this situation, any behavior can apply; it just has to be documented by the implementation, and the conversion must not fail. For instance, the result could be clamped into the int range, producing INT_MAX. A common behavior seen on two's complement machines (virtually all) in the conversion of wider (or equal width) unsigned types to narrower signed types is simple bit truncation: enough bits are taken from the unsigned value to fill the signed value, including its sign bit.
Because of the way two's complement works, if the original arithmetically correct abstract result itself fits into int, then the conversion will produce that result.
For instance, suppose that the subtraction of a pair of 64 bit size_t values on a two's complement machine yields the abstract arithmetic value -3, which is becomes the positive value 0xFFFFFFFFFFFFFFFD. When this is coerced into a 32 bit int, then the common behavior seen in many compilers for two's complement machines is that the lower 32 bits are taken as the image of the resulting int: 0xFFFFFFFD. And, of course, that is just the value -3 in 32 bits.
So the upshot is, that your code is de facto quite portable because virtually all mainstream machines are two's complement with conversion rules based on sign extension and bit truncation, including between signed and unsigned.
Except that sign extension doesn't occur when a value is widened while converting from unsigned to signed. Thus he one problem is the rare situation in which int is wider than size_t. If a 16 bit size_t result is 65533, due to 4 being subtracted from 1, this will not produce a -3 when converted to a 32 bit int; it will produce 65533!
If you don't use size_t, you are screwed: size_t is the one type that exists to be used for memory sizes, and which is consequently guaranteed to always be big enough for that purpose. (uintptr_t is quite similar, but it's neither the first such type, nor is it used by the standard libraries, nor is it available without including stdint.h.) If you use an int, you can get undefined behavior when your allocations exceed 2GiB of address space (or 32kiB if you are on a platform where int is only 16 bits!), even though the machine has more memory and you are executing in 64 bit mode.
If you need a difference of size_t that may become negative, use the signed variant ssize_t.

Why prefer signed over unsigned in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'd like to understand better why choose int over unsigned?
Personally, I've never liked signed values unless there is a valid reason for them. e.g. count of items in an array, or length of a string, or size of memory block, etc., so often these things cannot possibly be negative. Such a value has no possible meaning. Why prefer int when it is misleading in all such cases?
I ask this because both Bjarne Stroustrup and Chandler Carruth gave the advice to prefer int over unsigned here (approx 12:30').
I can see the argument for using int over short or long - int is the "most natural" data width for the target machine architecture.
But signed over unsigned has always annoyed me. Are signed values genuinely faster on typical modern CPU architectures? What makes them better?
As per requests in comments: I prefer int instead of unsigned because...
it's shorter (I'm serious!)
it's more generic and more intuitive (i. e. I like to be able to assume that 1 - 2 is -1 and not some obscure huge number)
what if I want to signal an error by returning an out-of-range value?
Of course there are counter-arguments, but these are the principal reasons I like to declare my integers as int instead of unsigned. Of course, this is not always true, in other cases, an unsigned is just a better tool for a task, I am just answering the "why would anyone prefer defaulting to signed" question specifically.
Let me paraphrase the video, as the experts said it succinctly.
Andrei Alexandrescu:
No simple guideline.
In systems programming, we need integers of different sizes and signedness.
Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.
Chandler Carruth:
Here's some simple guidelines:
Use signed integers unless you need two's complement arithmetic or a bit pattern
Use the smallest integer that will suffice.
Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
Stop worrying and use tools to tell you when you need a different type or size.
Bjarne Stroustrup:
Use int until you have a reason not to.
Use unsigned only for bit patterns.
Never mix signed and unsigned
Wariness about signedness rules aside, my one-sentence take away from the experts:
Use the appropriate type, and when you don't know, use an int until you do know.
Several reasons:
Arithmetic on unsigned always yields unsigned, which can be a problem when subtracting integer quantities that can reasonably result in a negative result — think subtracting money quantities to yield balance, or array indices to yield distance between elements. If the operands are unsigned, you get a perfectly defined, but almost certainly meaningless result, and a result < 0 comparison will always be false (of which modern compilers will fortunately warn you).
unsigned has the nasty property of contaminating the arithmetic where it gets mixed with signed integers. So, if you add a signed and unsigned and ask whether the result is greater than zero, you can get bitten, especially when the unsigned integral type is hidden behind a typedef.
There are no reasons to prefer signed over unsigned, aside from purely sociological ones, i.e. some people believe that average programmers are not competent and/or attentive enough to write proper code in terms of unsigned types. This is often the main reasoning used by various "speakers", regardless of how respected those speakers might be.
In reality, competent programmers quickly develop and/or learn the basic set of programming idioms and skills that allow them to write proper code in terms of unsigned integral types.
Note also that the fundamental differences between signed and unsigned semantics are always present (in superficially different form) in other parts of C and C++ language, like pointer arithmetic and iterator arithmetic. Which means that in general case the programmer does not really have the option of avoiding dealing with issues specific to unsigned semantics and the "problems" it brings with it. I.e. whether you want it or not, you have to learn to work with ranges that terminate abruptly at their left end and terminate right here (not somewhere in the distance), even if you adamantly avoid unsigned integers.
Also, as you probably know, many parts of standard library already rely on unsigned integer types quite heavily. Forcing signed arithmetic into the mix, instead of learning to work with unsigned one, will only result in disastrously bad code.
The only real reason to prefer signed in some contexts that comes to mind is that in mixed integer/floating-point code signed integer formats are typically directly supported by FPU instruction set, while unsigned formats are not supported at all, making the compiler to generate extra code for conversions between floating-point values and unsigned values. In such code signed types might perform better.
But at the same time in purely integer code unsigned types might perform better than signed types. For example, integer division often requires additional corrective code in order to satisfy the requirements of the language spec. The correction is only necessary in case of negative operands, so it wastes CPU cycles in situations when negative operands are not really used.
In my practice I devotedly stick to unsigned wherever I can, and use signed only if I really have to.
The integral types in C and many languages which derive from it have two general usage cases: to represent numbers, or represent members of an abstract algebraic ring. For those unfamiliar with abstract algebra, the primary notion behind a ring is that adding, subtracting, or multiplying two items of a ring should yield another item of that ring--it shouldn't crash or yield a value outside the ring. On a 32-bit machine, adding unsigned 0x12345678 to unsigned 0xFFFFFFFF doesn't "overflow"--it simply yields the result 0x12345677 which is defined for the ring of integers congruent mod 2^32 (because the arithmetic result of adding 0x12345678 to 0xFFFFFFFF, i.e. 0x112345677, is congruent to 0x12345677 mod 2^32).
Conceptually, both purposes (representing numbers, or representing members of the ring of integers congruent mod 2^n) may be served by both signed and unsigned types, and many operations are the same for both usage cases, but there are some differences. Among other things, an attempt to add two numbers should not be expected to yield anything other than the correct arithmetic sum. While it's debatable whether a language should be required to generate the code necessary to guarantee that it won't (e.g. that an exception would be thrown instead), one could argue that for code which uses integral types to represent numbers such behavior would be preferable to yielding an arithmetically-incorrect value and compilers shouldn't be forbidden from behaving that way.
The implementers of the C standards decided to use signed integer types to represent numbers and unsigned types to represent members of the algebraic ring of integers congruent mod 2^n. By contrast, Java uses signed integers to represent members of such rings (though they're interpreted differently in some contexts; conversions among differently-sized signed types, for example, behave differently from among unsigned ones) and Java has neither unsigned integers nor any primitive integral types which behave as numbers in all non-exceptional cases.
If a language provided a choice of signed and unsigned representations for both numbers and algebraic-ring numbers, it might make sense to use unsigned numbers to represent quantities that will always be positive. If, however, the only unsigned types represent members of an algebraic ring, and the only types that represent numbers are the signed ones, then even if a value will always be positive it should be represented using a type designed to represent numbers.
Incidentally, the reason that (uint32_t)-1 is 0xFFFFFFFF stems from the fact that casting a signed value to unsigned is equivalent to adding unsigned zero, and adding an integer to an unsigned value is defined as adding or subtracting its magnitude to/from the unsigned value according to the rules of the algebraic ring which specify that if X=Y-Z, then X is the one and only member of that ring such X+Z=Y. In unsigned math, 0xFFFFFFFF is the only number which, when added to unsigned 1, yields unsigned zero.
Speed is the same on modern architectures. The problem with unsigned int is that it can sometimes generate unexpected behavior. This can create bugs that wouldn't show up otherwise.
Normally when you subtract 1 from a value, the value gets smaller. Now, with both signed and unsigned int variables, there will be a time that subtracting 1 creates a value that is MUCH LARGER. The key difference between unsigned int and int is that with unsigned int the value that generates the paradoxical result is a commonly used value --- 0 --- whereas with signed the number is safely far away from normal operations.
As far as returning -1 for an error value --- modern thinking is that it's better to throw an exception than to test for return values.
It's true that if you properly defend your code you won't have this problem, and if you use unsigned religiously everywhere you will be okay (provided that you are only adding, and never subtracting, and that you never get near MAX_INT). I use unsigned int everywhere. But it takes a lot of discipline. For a lot of programs, you can get by with using int and spend your time on other bugs.
Use int by default: it plays nicer with the rest of the language
most common domain usage is regular arithmetic, not modular arithmetic
int main() {} // see an unsigned?
auto i = 0; // i is of type int
Only use unsigned for modulo arithmetic and bit-twiddling (in particular shifting)
has different semantics than regular arithmetic, make sure it is what you want
bit-shifting signed types is subtle (see comments by #ChristianRau)
if you need a > 2Gb vector on a 32-bit machine, upgrade your OS / hardware
Never mix signed and unsigned arithmetic
the rules for that are complicated and surprising (either one can be converted to the other, depending on the relative type sizes)
turn on -Wconversion -Wsign-conversion -Wsign-promo (gcc is better than Clang here)
the Standard Library got it wrong with std::size_t (quote from the GN13 video)
use range-for if you can,
for(auto i = 0; i < static_cast<int>(v.size()); ++i) if you must
Don't use short or large types unless you actually need them
current architectures data flow caters well to 32-bit non-pointer data (but note the comment by #BenVoigt about cache effects for smaller types)
char and short save space but suffer from integral promotions
are you really going to count to over all int64_t?
To answer the actual question: For the vast number of things, it doesn't really matter. int can be a little easier to deal with things like subtraction with the second operand larger than the first and you still get a "expected" result.
There is absolutely no speed difference in 99.9% of cases, because the ONLY instructions that are different for signed and unsigned numbers are:
Making the number longer (fill with the sign for signed or zero for unsigned) - it takes the same effort to do both.
Comparisons - a signed number, the processor has to take into account if either number is negative or not. But again, it's the same speed to make a compare with signed or unsigned numbers - it's just using a different instruction code to say "numbers that have the highest bit set are smaller than numbers with the highest bit not set" (essentially). [Pedantically, it's nearly always the operation using the RESULT of a comparison that is different - the most common case being a conditional jump or branch instruction - but either way, it's the same effort, just that the inputs are taken to mean slightly different things].
Multiply and divide. Obviously, sign conversion of the result needs to happen if it's a signed multiplication, where a unsigned should not change the sign of the result if the highest bit of one of the inputs is set. And again, the effort is (as near as we care for) identical.
(I think there are one or two other cases, but the result is the same - it really doesn't matter if it's signed or unsigned, the effort to perform the operation is the same for both).
The int type more closely resembles the behavior of mathematical integers than the unsigned type.
It is naive to prefer the unsigned type simply because a situation does not require negative values to be represented.
The problem is that the unsigned type has a discontinuous behavior right next to zero. Any operation that tries to compute a small negative value, instead produces some large positive value. (Worse: one that is implementation-defined.)
Algebraic relationships such as that a < b implies that a - b < 0 are wrecked in the unsigned domain, even for small values like a = 3 and b = 4.
A descending loop like for (i = max - 1; i >= 0; i--) fails to terminate if i is made unsigned.
Unsigned quirks can cause a problem which will affect code regardless of whether that code expects to be representing only positive quantities.
The virtue of the unsigned types is that certain operations that are not portably defined at the bit level for the signed types are that way for the unsigned types. The unsigned types lack a sign bit, and so shifting and masking through the sign bit isn't a problem. The unsigned types are good for bitmasks, and for code that implements precise arithmetic in a platform-independent way. Unsigned opearations will simulate two's complement semantics even on a non two's complement machine. Writing a multi-precision (bignum) library practically requires arrays of unsigned types to be used for the representation, rather than signed types.
The unsigned types are also suitable in situations in which numbers behave like identifiers and not as arithmetic types. For instance, an IPv4 address can be represented in a 32 bit unsigned type. You wouldn't add together IPv4 addresses.
int is preferred because it's most commonly used. unsigned is usually associated with bit operations. Whenever I see an unsigned, I assume it's used for bit twiddling.
If you need a bigger range, use a 64-bit integer.
If you're iterating over stuff using indexes, types usually have size_type, and you shouldn't care whether it's signed or unsigned.
Speed is not an issue.
For me, in addition to all the integers in the range of 0..+2,147,483,647 contained within the set of signed and unsigned integers on 32 bit architectures, there is a higher probability that I will need to use -1 (or smaller) than need to use +2,147,483,648 (or larger).
One good reason that I can think of is in case of detecting overflow.
For the use cases such as the count of items in an array, length of a string, or size of memory block, you can overflow an unsigned int and you may not notice a difference even when you take a look at the variable. If it is an signed int, the variable will be less than zero and clearly wrong.
You can simply check to see if the variable is zero when you want to use it. This way, you do not have to check for overflow after every arithmetic operation as is the case for unsigned ints.
It gives unexpected result when doing simple arithmetic operation:
unsigned int i;
i = 1 - 2;
//i is now 4294967295 on a 64bit machine
It gives unexpected result when doing simple comparison:
unsigned int j = 1;
std::cout << (j>-1) << std::endl;
//output 0 as false but 1 is greater than -1
This is because when doing the operations above, the signed ints are converted to unsigned, and it overflows and goes to a really big number.

Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard (§6.2.5/9) states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
However, both standards state that signed integer overflow is undefined behavior. Again, from the C99 standard (§3.4.3/1)
An example of undefined behavior is the behavior on integer overflow
Is there an historical or (even better!) a technical reason for this discrepancy?
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
Relevant quotes:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (one’s complement).
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.
Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).
It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).
First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!
The relevant text that states that overflow of integers and floats is undefined behavior is this:
C11 6.5/5
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
A clarification regarding the behavior of unsigned integer types specifically can be found here:
C11 6.2.5/9
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same. A computation involving
unsigned operands can never overflow, because a result that cannot be
represented by the resulting unsigned integer type is reduced modulo
the number that is one greater than the largest value that can be
represented by the resulting type.
This makes unsigned integer types a special case.
Also note that there is an exception if any type is converted to a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.
C11 6.3.1.3
6.3.1.3 Signed and unsigned integers
When a value with integer
type is converted to another integer type other than _Bool, if the
value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value
cannot be represented in it; either the result is
implementation-defined or an implementation-defined signal is raised.
In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X and Y, there will exist some other value Z such that X+Z will, if properly cast, equal Y and Y-Z will, if properly cast, equal X). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.
This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.
Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.
The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).
Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.
The most technical reason of all, is simply that trying to capture overflow in an unsigned integer requires more moving parts from you (exception handling) and the processor (exception throwing).
C and C++ won't make you pay for that unless you ask for it by using a signed integer. This isn't a hard-fast rule, as you'll see near the end, but just how they proceed for unsigned integers. In my opinion, this makes signed integers the odd-one out, not unsigned, but it's fine they offer this fundamental difference as the programmer can still perform well-defined signed operations with overflow. But to do so, you must cast for it.
Because:
unsigned integers have well defined overflow and underflow
casts from signed -> unsigned int are well defined, [uint's name]_MAX - 1 is conceptually added to negative values, to map them to the extended positive number range
casts from unsigned -> signed int are well defined, [uint's name]_MAX - 1 is conceptually deducted from positive values beyond the signed type's max, to map them to negative numbers)
You can always perform arithmetic operations with well-defined overflow and underflow behavior, where signed integers are your starting point, albeit in a round-about way, by casting to unsigned integer first then back once finished.
int32_t x = 10;
int32_t y = -50;
// writes -60 into z, this is well defined
int32_t z = int32_t(uint32_t(y) - uint32_t(x));
Casts between signed and unsigned integer types of the same width are free, if the CPU is using 2's compliment (nearly all do). If for some reason the platform you're targeting doesn't use 2's Compliment for signed integers, you will pay a small conversion price when casting between uint32 and int32.
But be wary when using bit widths smaller than int
usually if you are relying on unsigned overflow, you are using a smaller word width, 8bit or 16bit. These will promote to signed int at the drop of a hat (C has absolutely insane implicit integer conversion rules, this is one of C's biggest hidden gotcha's), consider:
unsigned char a = 0;
unsigned char b = 1;
printf("%i", a - b); // outputs -1, not 255 as you'd expect
To avoid this, you should always cast to the type you want when you are relying on that type's width, even in the middle of an operation where you think it's unnecessary. This will cast the temporary and get you the signedness AND truncate the value so you get what you expected. It's almost always free to cast, and in fact, your compiler might thank you for doing so as it can then optimize on your intentions more aggressively.
unsigned char a = 0;
unsigned char b = 1;
printf("%i", (unsigned char)(a - b)); // cast turns -1 to 255, outputs 255

What do the C and C++ standards say about bit-level integer representation and manipulation?

I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:
If all the bits in an integer type are zero, does the integer as whole represent zero?
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Is there a guaranteed way to check if any bit is not set?
Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?
I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").
Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.
(1) If all the bits in an integer type are zero, does the integer as whole represent zero?
Yes, the bit pattern consisting of all zeroes always represents 0:
The representations of integral types shall define values by use of a pure binary numeration system.49 [§3.9.1/7]
49 A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. In fact, signed magnitude is specifically allowed:
[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end
example ] [§3.9.1/7]
(3) Is there a guaranteed way to check if any bit is not set?
I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]
(4) Is there a guaranteed way to check if any bit is set?
I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.
(5) Is there a guaranteed way to set the left-most and/or right-most bits?
Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:
Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]
You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.
The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.
As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.
I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.
You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.
If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.
Good luck finding a machine nowadays that uses signed magnitude or one's complement.
If you want your brain to explode, consider this: If you interpret an int or long or long long as an array of unsigned char (which is the most reasonable thing to do if you want to see all the bits), you know that the order of bytes is not defined, for example "bigendian" vs. "littleendian". We all (hopefully) know that.
But it is worse: Each bit of an int could be stored in any of the bits of the array of char. So there are 32! ways how the bits of a 32 bit integer could be mapped to an array of four 8-bit unsigned chars by a truly bizarre implementation. Fortunately, I haven't encountered more than two ways myself (and I know of one more ordering in a real computer).
If all the bits in an integer type are zero, does the integer as whole represent zero?
Edit: since you have now clarified that you are not concerned with the padding bits, the answer to this is actually "yes". But I leave the original:
Not necessarily, it could be a trap representation. See C99 6.2.6.1:
For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)
The presence of padding bits allows for the possibility that all 0 is a trap representation. (As noted by Keith Thompson in the comment below, the more recent C11 makes explicit that such a representation is not a trap representation).
and
The values of any padding bits are unspecified
and
44) Some combinations of padding bits might generate trap representations
If you restrict the question to value and sign bits, the answer is yes, due to 6.2.6.2:
If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2 N −1 , so that objects of that type shall be capable of representing values from 0 to 2 N − 1 using a pure binary representation; this shall be known as the value representation.
and
If the sign bit is zero, it shall not affect the resulting value.
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Not necessarily, and in fact sign-and-magnitude is explicitly supported in 6.2.6.2.
Is there a guaranteed way to check if any bit is not set?
If you do not care about padding and sign bits, you could just compare to 0, but this would not work with a 1's complement representation (which is allowed) seeing as all bits 0 and all bits 1 both represent the value 0.
Otherwise: you can read the value of each byte via an unsigned char *, and compare the result to 0:
Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation
If you want to check a specific value bit, you could construct a suitable bitmask using (1u << n), but this will not necessarily let you inspect the sign bit.
Is there a guaranteed way to check if any bit is set?
The answer is essentially the same as to the previous question.
Is there a guaranteed way to set the left-most and/or right-most bits?
Do you mean left-most value bit? You could count the bits in INT_MAX or UINT_MAX or equivalent depending on the type, and use that to construct a value (via 1 << n) with which to OR the original value.
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
You can do so using a bitmask which you left shift repeatedly, but you can check only the value bits this way and not the sign bit.
For the bitmanipulations you could make a struct with 8 one unsigned bit fields and let the pointer of that struct point to your char. In that way you can easily access each bit. But the compiler will probably do masking under the hood, so it is only a cleaner way for the programmer I think. You must check that your compiler doesn't change the order of the fields when doing this.
yourstruct* pChar=(yourstruct*)(&c)
pChar.Bit7=1;
Let me caveat this by saying I'm addressing C and C++ in general (e.g. C90 and lower, MS Visual C++, etc): the "greatest common denominator" (vs. the latest/greatest cx11 "standard").
Q: If all the bits in an integer type are zero, does the integer as whole represent zero?
A: Yes
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
A: Yes. This includes the sign bit, for a signed int.
I'm frankly not familiar with "magnitude"
Q: Is there a guaranteed way to check if any bit is not set?
A: "And'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to check if any bit is set?
A: Again, "and'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to set the left-most and/or right-most bits?
A: I believe you should always have a "MAX_INT" available for all implementations/all architectures to determine the leftmost bit.
I'm prepared to be flamed ... but I believe the above is accurate. And I hope it helps.
IMHO...

What is wrong with this bit-manipulation code from an interview question?

I was having a look over this page: http://www.devbistro.com/tech-interview-questions/Cplusplus.jsp, and didn't understand this question:
What’s potentially wrong with the following code?
long value;
//some stuff
value &= 0xFFFF;
Note: Hint to the candidate about the base platform they’re developing for. If the person still doesn’t find anything wrong with the code, they are not experienced with C++.
Can someone elaborate on it?
Thanks!
Several answers here state that if an int has a width of 16 bits, 0xFFFF is negative. This is not true. 0xFFFF is never negative.
A hexadecimal literal is represented by the first of the following types that is large enough to contain it: int, unsigned int, long, and unsigned long.
If int has a width of 16 bits, then 0xFFFF is larger than the maximum value representable by an int. Thus, 0xFFFF is of type unsigned int, which is guaranteed to be large enough to represent 0xFFFF.
When the usual arithmetic conversions are performed for evaluation of the &, the unsigned int is converted to a long. The conversion of a 16-bit unsigned int to long is well-defined because every value representable by a 16-bit unsigned int is also representable by a 32-bit long.
There's no sign extension needed because the initial type is not signed, and the result of using 0xFFFF is the same as the result of using 0xFFFFL.
Alternatively, if int is wider than 16 bits, then 0xFFFF is of type int. It is a signed, but positive, number. In this case both operands are signed, and long has the greater conversion rank, so the int is again promoted to long by the usual arithmetic conversions.
As others have said, you should avoid performing bitwise operations on signed operands because the numeric result is dependent upon how signedness is represented.
Aside from that, there's nothing particularly wrong with this code. I would argue that it's a style concern that value is not initialized when it is declared, but that's probably a nit-pick level comment and depends upon the contents of the //some stuff section that was omitted.
It's probably also preferable to use a fixed-width integer type (like uint32_t) instead of long for greater portability, but really that too depends on the code you are writing and what your basic assumptions are.
I think depending on the size of a long the 0xffff literal (-1) could be promoted to a larger size and being a signed value it will be sign extended, potentially becoming 0xffffffff (still -1).
I'll assume it's because there's no predefined size for a long, other than it must be at least as big as the preceding size (int). Thus, depending on the size, you might either truncate value to a subset of bits (if long is more than 32 bits) or overflow (if it's less than 32 bits).
Yeah, longs (per the spec, and thanks for the reminder in the comments) must be able to hold at least -2147483647 to 2147483647 (LONG_MIN and LONG_MAX).
For one value isn't initialized before doing the and so I think the behaviour is undefined, value could be anything.
long type size is platform/compiler specific.
What you can here say is:
It is signed.
We can't know the result of value &= 0xFFFF; since it could be for example value &= 0x0000FFFF; and will not do what expected.
While one could argue that since it's not a buffer-overflow or some other error that's likely to be exploitable, it's a style thing and not a bug, I'm 99% confident that the answer that the question-writer is looking for is that value is operated on before it's assigned to. The value is going to be arbitrary garbage, and that's unlikely to be what was meant, so it's "potentially wrong".
Using MSVC I think that the statement would perform what was most likely intended - that is: clear all but the least significant 16 bits of value, but I have encountered other platforms which would interpret the literal 0xffff as equivalent to (short)-1, then sign extend to convert to long, in which case the statement "value &= 0xFFFF" would have no effect.
"value &= 0x0FFFF" is more explicit and robust.