Why is 1 not greater than -0x80000000 [duplicate] - c++

This question already has answers here:
Why is 0 < -0x80000000?
(6 answers)
Closed 7 years ago.
Why is 1 not greater than -0x80000000. I know it has something to do with overflow. But can someone explain why? is 0x80000000 not a constant I think it is?
assert(1 > -0x80000000);
The assert triggers in C++. Why is that?
I am grateful for some of the answer provided. But does C++ standard define that the constant needs to be stored in a 32 bit integer? Why doesn't compiler recognized that 80000000 isn't going to be fit for a 32 bit integer and use 64 bit for it? I mean, the largest 32 bit int can be 0x7FFFFFFF. 0x80000000 is obviously larger than that. Why does compiler still use 32 bit for that?

According to the C and C++ standards, -0x80000000 is not an integer constant. It's an expression, like 3 + 5. In this case, it's the constant 0x80000000, operated upon by the negation operator. For compilers which have 32-bit ints, 0x80000000 is not representable as an int, but is representable as an unsigned int. But negating an unsigned integer is (weirdly) done in an unsigned context. So the negation here effectively has no effect.

One way to fix this is to use a type that you know it is likely to be able to represent and retain your value correctly, which means that your expression can be fixed like so
assert(1 > -0x80000000L);
or
assert(1 > -0x80000000LL);
Which is basically about using standard suffix in C++ for your supposedly integer expression.
The only 3 standard suffix for integer types in C++ are u, l and ll, along with the uppercase variations that mean the same thing as their lowercase counterpart; U, L and LL.

Related

Why large positive number stored as negative numbers in computer memory? [duplicate]

This question already has answers here:
C++ integer overflow
(4 answers)
Closed 1 year ago.
I am using C++14. Size of int is 4 bytes.
Here is my code:
#include<iostream>
using namespace std;
int main()
{
int a=4294967290;
int b=-6;
if(b==a)
cout<<"Equal numbers";
}
This is giving output as Equal numbers, that means 4294967290 is equal to -6 in memory in binary format.
Then how are large positive numbers distinguished from negative numbers?
Is this only with C++ or with any other programming language?
Bits is bits. What the bits mean is up to you.
Let's talk about 8-bit quantities to make it easier on us. Consider the bit pattern
1 0 0 0 0 0 0 0
What does that 'mean'?
If you want to consider it as an unsigned binary integer, it's 128 (equals 2 to the 7th power).
If you want to consider it as a signed binary integer in twos-complenent representation, it's -128.
If you want to treat it as a signed binary integer in sign-and-magnitude representation (which nobody does any more), it's -0. Which is one reason we don't do that.
In short, the way large positive numbers are distinguished from negative numbers is that the programmer knows what he intends the bits to mean. It's something that does not exist in the bits themselves.
Languages like C/C++ have signed and unsigned types to help (by defining whether, for example, 1000 0000 is greater or less than 0000 0000), but there will always be pitfalls you need to be aware of, because integers in computer hardware are finite, unlike the real world.

What exactly is a bit vector in C++? [duplicate]

This question already has answers here:
C/C++ Bit Array or Bit Vector
(5 answers)
Closed 7 years ago.
So, I was reading a question in Cracking the Coding Interview: 5th Edition where it says to implement a bit vector with 4 billion bits. And it defines a bit vector as an array that compactly stores boolean values by using an array of ints. Each int stores a sequence of 32 bits, or boolean values. I am sort of confused in the above definition. Can someone explain me what exactly does the above statement mean?
The marked question that has been attached as duplicate, I couldn't really understand since their is no associated example. The second answer does have an example but it's not really understandable. It will be great if any of you can add an example, albeit for a small value only. Thanks!
The bool type is at least 1 byte. It means it's at least 8 bits.
In a 'int' type, on a 32bits system, it's 32 bits.
You then have 32 booleans in 4 bytes with int, instead of 32 bytes minimum if you use bool type.
In an int you can store 32 booleans by basic bit operations : &, | and ~

What do the C and C++ standards say about bit-level integer representation and manipulation?

I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:
If all the bits in an integer type are zero, does the integer as whole represent zero?
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Is there a guaranteed way to check if any bit is not set?
Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?
I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").
Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.
(1) If all the bits in an integer type are zero, does the integer as whole represent zero?
Yes, the bit pattern consisting of all zeroes always represents 0:
The representations of integral types shall define values by use of a pure binary numeration system.49 [§3.9.1/7]
49 A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.
(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. In fact, signed magnitude is specifically allowed:
[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end
example ] [§3.9.1/7]
(3) Is there a guaranteed way to check if any bit is not set?
I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]
(4) Is there a guaranteed way to check if any bit is set?
I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.
(5) Is there a guaranteed way to set the left-most and/or right-most bits?
Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:
Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]
You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.
The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.
As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.
I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.
You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.
If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.
Good luck finding a machine nowadays that uses signed magnitude or one's complement.
If you want your brain to explode, consider this: If you interpret an int or long or long long as an array of unsigned char (which is the most reasonable thing to do if you want to see all the bits), you know that the order of bytes is not defined, for example "bigendian" vs. "littleendian". We all (hopefully) know that.
But it is worse: Each bit of an int could be stored in any of the bits of the array of char. So there are 32! ways how the bits of a 32 bit integer could be mapped to an array of four 8-bit unsigned chars by a truly bizarre implementation. Fortunately, I haven't encountered more than two ways myself (and I know of one more ordering in a real computer).
If all the bits in an integer type are zero, does the integer as whole represent zero?
Edit: since you have now clarified that you are not concerned with the padding bits, the answer to this is actually "yes". But I leave the original:
Not necessarily, it could be a trap representation. See C99 6.2.6.1:
For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)
The presence of padding bits allows for the possibility that all 0 is a trap representation. (As noted by Keith Thompson in the comment below, the more recent C11 makes explicit that such a representation is not a trap representation).
and
The values of any padding bits are unspecified
and
44) Some combinations of padding bits might generate trap representations
If you restrict the question to value and sign bits, the answer is yes, due to 6.2.6.2:
If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2 N −1 , so that objects of that type shall be capable of representing values from 0 to 2 N − 1 using a pure binary representation; this shall be known as the value representation.
and
If the sign bit is zero, it shall not affect the resulting value.
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Not necessarily, and in fact sign-and-magnitude is explicitly supported in 6.2.6.2.
Is there a guaranteed way to check if any bit is not set?
If you do not care about padding and sign bits, you could just compare to 0, but this would not work with a 1's complement representation (which is allowed) seeing as all bits 0 and all bits 1 both represent the value 0.
Otherwise: you can read the value of each byte via an unsigned char *, and compare the result to 0:
Values stored in unsigned bit-fields and objects of type unsigned char
shall be represented using a pure binary notation
If you want to check a specific value bit, you could construct a suitable bitmask using (1u << n), but this will not necessarily let you inspect the sign bit.
Is there a guaranteed way to check if any bit is set?
The answer is essentially the same as to the previous question.
Is there a guaranteed way to set the left-most and/or right-most bits?
Do you mean left-most value bit? You could count the bits in INT_MAX or UINT_MAX or equivalent depending on the type, and use that to construct a value (via 1 << n) with which to OR the original value.
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?
You can do so using a bitmask which you left shift repeatedly, but you can check only the value bits this way and not the sign bit.
For the bitmanipulations you could make a struct with 8 one unsigned bit fields and let the pointer of that struct point to your char. In that way you can easily access each bit. But the compiler will probably do masking under the hood, so it is only a cleaner way for the programmer I think. You must check that your compiler doesn't change the order of the fields when doing this.
yourstruct* pChar=(yourstruct*)(&c)
pChar.Bit7=1;
Let me caveat this by saying I'm addressing C and C++ in general (e.g. C90 and lower, MS Visual C++, etc): the "greatest common denominator" (vs. the latest/greatest cx11 "standard").
Q: If all the bits in an integer type are zero, does the integer as whole represent zero?
A: Yes
Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
A: Yes. This includes the sign bit, for a signed int.
I'm frankly not familiar with "magnitude"
Q: Is there a guaranteed way to check if any bit is not set?
A: "And'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to check if any bit is set?
A: Again, "and'ing" a bitmask is always guaranteed.
Q: Is there a guaranteed way to set the left-most and/or right-most bits?
A: I believe you should always have a "MAX_INT" available for all implementations/all architectures to determine the leftmost bit.
I'm prepared to be flamed ... but I believe the above is accurate. And I hope it helps.
IMHO...

What is wrong with this bit-manipulation code from an interview question?

I was having a look over this page: http://www.devbistro.com/tech-interview-questions/Cplusplus.jsp, and didn't understand this question:
What’s potentially wrong with the following code?
long value;
//some stuff
value &= 0xFFFF;
Note: Hint to the candidate about the base platform they’re developing for. If the person still doesn’t find anything wrong with the code, they are not experienced with C++.
Can someone elaborate on it?
Thanks!
Several answers here state that if an int has a width of 16 bits, 0xFFFF is negative. This is not true. 0xFFFF is never negative.
A hexadecimal literal is represented by the first of the following types that is large enough to contain it: int, unsigned int, long, and unsigned long.
If int has a width of 16 bits, then 0xFFFF is larger than the maximum value representable by an int. Thus, 0xFFFF is of type unsigned int, which is guaranteed to be large enough to represent 0xFFFF.
When the usual arithmetic conversions are performed for evaluation of the &, the unsigned int is converted to a long. The conversion of a 16-bit unsigned int to long is well-defined because every value representable by a 16-bit unsigned int is also representable by a 32-bit long.
There's no sign extension needed because the initial type is not signed, and the result of using 0xFFFF is the same as the result of using 0xFFFFL.
Alternatively, if int is wider than 16 bits, then 0xFFFF is of type int. It is a signed, but positive, number. In this case both operands are signed, and long has the greater conversion rank, so the int is again promoted to long by the usual arithmetic conversions.
As others have said, you should avoid performing bitwise operations on signed operands because the numeric result is dependent upon how signedness is represented.
Aside from that, there's nothing particularly wrong with this code. I would argue that it's a style concern that value is not initialized when it is declared, but that's probably a nit-pick level comment and depends upon the contents of the //some stuff section that was omitted.
It's probably also preferable to use a fixed-width integer type (like uint32_t) instead of long for greater portability, but really that too depends on the code you are writing and what your basic assumptions are.
I think depending on the size of a long the 0xffff literal (-1) could be promoted to a larger size and being a signed value it will be sign extended, potentially becoming 0xffffffff (still -1).
I'll assume it's because there's no predefined size for a long, other than it must be at least as big as the preceding size (int). Thus, depending on the size, you might either truncate value to a subset of bits (if long is more than 32 bits) or overflow (if it's less than 32 bits).
Yeah, longs (per the spec, and thanks for the reminder in the comments) must be able to hold at least -2147483647 to 2147483647 (LONG_MIN and LONG_MAX).
For one value isn't initialized before doing the and so I think the behaviour is undefined, value could be anything.
long type size is platform/compiler specific.
What you can here say is:
It is signed.
We can't know the result of value &= 0xFFFF; since it could be for example value &= 0x0000FFFF; and will not do what expected.
While one could argue that since it's not a buffer-overflow or some other error that's likely to be exploitable, it's a style thing and not a bug, I'm 99% confident that the answer that the question-writer is looking for is that value is operated on before it's assigned to. The value is going to be arbitrary garbage, and that's unlikely to be what was meant, so it's "potentially wrong".
Using MSVC I think that the statement would perform what was most likely intended - that is: clear all but the least significant 16 bits of value, but I have encountered other platforms which would interpret the literal 0xffff as equivalent to (short)-1, then sign extend to convert to long, in which case the statement "value &= 0xFFFF" would have no effect.
"value &= 0x0FFFF" is more explicit and robust.

Unsigned negative primitives?

In C++ we can make primitives unsigned. But they are always positive. Is there also a way to make unsigned negative variables? I know the word unsigned means "without sign", so also not a minus (-) sign. But I think C++ must provide it.
No. unsigned can only contain nonnegative numbers.
If you need a type that only represent negative numbers, you need to write a class yourself, or just interpret the value as negative in your program.
(But why do you need such a type?)
unsigned integers are only positive. From 3.9.1 paragraph 3 of the 2003 C++ standard:
The range of nonnegative values of a
signed integer type is a subrange of
the corresponding unsigned integer
type, and the value representation of
each corresponding signed/unsigned
type shall be the same.
The main purpose of the unsigned integer types is to support modulo arithmetic. From 3.9.1 paragraph 4:
Unsigned integers, declared unsigned,
shall obey the laws of arithmetic
modulo 2n where n is the
number of bits in the value
representation of that particular size
of integer.
You are free, of course, to treat them as negative if you wish, you'll just have to keep track of that yourself somehow (perhaps with a Boolean flag).
I think you are thinking it the wrong way.
If you want a negative number, then you must not declare the variable as unsigned.
If your problem is the value range, and you want that one more bit, then you could use a "bigger" data type (int 64 for example...).
Then if you are using legacy code, creating a new struct can solve your problem, but this is that way because of your specific situation, C++ shouldn't handle it.
Don't be fooled by the name: unsigned is often misunderstood as non-negative, but the rules for the language are different... probably a better name would have been "bitmask" or "modulo_integer".
If you think that unsigned is non-negative then for example implicit conversion rules are total nonsense (why a difference between two non-negative should be a non-negative ? why the addition of a non-negative and an integer should be non-negative ?).
It's very unfortunate that C++ standard library itself fell in that misunderstanding because for example vector.size() is unsigned (absurd if you mean it as the language itself does in terms of bitmask or modulo_integer). This choice for sizes has more to do with the old 16-bit times than with unsigned-ness and it was in my opinion a terrible choice that we're still paying as bugs.
But I think C++ must provide it.
Why? You think that C++ must provide the ability to make non-negative numbers negative? That's silly.
Unsigned values in C++ are defined to always be non-negative. They even wrap around — rather than underflowing — at zero! (And the same at the other end of the range)