In this blog post the author has suggested the following as the bug fix:
int mid = (low + high) >>> 1;
Does anyone know what is this >>> operator? Certainly its not there on the following operator reference list:
http://msdn.microsoft.com/en-us/library/x04xhy0h%28v=vs.71%29.aspx
http://www.cplusplus.com/doc/tutorial/operators/
What is it and how does that solve the overflow problem?
>>> is not a part of C++. The blog contains code in Java.
Check out Java online tutorial here on Bitwise shift operators. It says
The unsigned right shift operator ">>>" shifts a zero into the leftmost position, while the leftmost position after ">>" depends on sign extension.
>>> is the logical right shift operator in Java.
It shifts in a zero on the left rather than preserving the sign bit. The author of the blog post even provides a C++ implementation:
mid = ((unsigned int)low + (unsigned int)high)) >> 1;
... if you right-shift unsigned numbers, preserving the sign bit doesn't make any sense (since there is no sign bit) so the compiler obviously uses logical shifts rather than arithmetic ones.
The above code exploits the MSB (32rd bit assuming 32 bit integers): adding low and high which are both nonnegative integers and fit thus into 31 bits never overflows the full 32 bits, but it extends to the MSB. By shifting it to the right, the 32 bit number is effectively divided by two and the 32rd bit is cleared again, so the result is positive.
The truth is that the >>> operator in Java is just a workaround for the fact that the language does not provide unsigned data types.
The >>> operator is in a Java code snippet, and it is the unsigned right shift operator. It differs from the >> operator in its treatment of signed values: the >> operator applies sign extension during the shift, while the >>> operator just inserts a zero in the bit positions "emptied" by the shift.
Sadly, in C++ there's no such thing as sign-preserving and unsigned right shift, we have only the >> operator, whose behavior on negative signed values is implementation-defined. To emulate a behavior like the one of >>> you have to perform some casts to unsigned int before applying the shift (as shown in the code snippet immediately following the one you posted).
The Java expression x >>> y is more or less equivalent to the C++ expression unsigned(x) >> y.
>>> is not C++ operator. I think it's an operator in Java language. I'm not sure though!
EDIT:
Yes. That is java operator. Check out the link to the article you provided. The article is using Java language!
It is a java operator, not related to C++.
However all the blog author does is change the division by 2 with a bit-wise right shift (i.e. right shifting the value by 1 is similar to dividing by 2 ^ 1).
Same functionality, different machine code output (bit shifting operations are almost always faster than multiplication/division on most architectures).
Related
I know that it is possible to use the left shift to implement multiplication by the power of two (x << 4 = x * 16).
Also, it is trivial to replace the right shift by division by a power of two (x >> 5 = x / 32).
I am wondering is it possible to replace the right shift with multiplication?
It seems to be not possible in the general case, but my question is limited to modulo 2^32 and 2^64 arithmetic (unsigned 32-bit and 64-bit values). Also, maybe it can be done if we can add other cheap instructions like + and - in addition to * to emulate the right bit shift?
I assume exotic architecture where the right shift is more expensive than other arithmetic (similar to division).
uint64_t foo(uint64_t x) {
return x >> 3; // how to avoid using right shift here?
}
There is a similar question How to perform right shifting binary multiplication? that asks how to replace multiplication of two unsigned numbers by right shift. Basically, it uses a loop internally. However, maybe if the second number is a constant, this loop can be avoided (or at least unrolled to a shorter fragment)?
"Multiply-high" aka high-mul, hmul, mulh, etc, can be used to emulate a shift-right with a constant count. Usually that's not a good trade. It's also hardly related to C++.
Normal multiplication (putting floating point stuff aside) cannot be used to implement a shift-right.
my question is limited to modulo 2^32 and 2^64 arithmetic
It doesn't help. You can use that property to "unmultiply" (sort of like divide, except not really) by odd numbers, for example if b = 5 * a then a = b * 0xCCCCCCCD, using the modular multiplicative inverse. The number being inverted must be relatively-prime relative to the modulus. Since the modulus is a power of two, the "divisor" here cannot be a power of two (except 1, but that does nothing), so a shift-right cannot be done this way.
Another way to look at it (probably simpler), is that what a multiplication does is conditionally add together a bunch of left-shifted versions of the multiplicand. Only left-shift versions, not right-shifted versions. Which of those shifted versions are selected by the multiplier doesn't matter, there are no right-shifted versions to select.
When I use the >> bitwise operator on 1000 in c++ it gives this result: 1100. I want the result to be 0100. When the 1 is in any other position this is exactly what happens, but with a leading 1 it goes wrong. Why is that and how can it be avoided?
The behavior you describe is coherent with what happens on some platforms when right-shifting a signed integer with the high bit set (so, negative values).
In this case, on many platforms compilers will emit code to perform an arithmetic shift, which propagates the sign bit; this, on platforms with 2's complement representation for negative integers (= virtually every current platform) has the effect of giving the "x >> i = floor(x/2i)" behavior even on negative values. Notice that this is not contractual - as far as the C++ standard is concerned, shifting negative integers in implementation-defined behavior, so any compiler is free to implement different semantics for it1.
To come to your question, to obtain the "regular" shift behavior (generally called "logical shift") you have to make sure to work on unsigned integers. This can be obtained either making sure that the variable you are shifting is of unsigned type (e.g. unsigned int) or, if it's a literal, by putting an U suffix to it (e.g. 1 is an int, 1U is an unsigned int).
If the data you have is of a signed type (e.g. int) you may cast it to the corresponding unsigned type before shifting without risks (conversion from a signed int to an unsigned one is well-defined by the standard, and doesn't change the bit values on 2's complement machines).
Historically, this comes from the fact that C strove to support even machines that didn't have "cheap" arithmetic shift functionality at hardware level and/or didn't use 2's complement representation.
As mentioned by others, when right shifting on a signed int, it is implementation defined whether you will get 1s or 0s. In your case, because the left most bit in 1000 is a 1, the "replacement bits" are also 1. Assuming you must work with signed ints, in order to get rid of it, you can apply a bitmask.
I'm aware of the two complement representation. I was wondering what are the specifics differences, in terms of implementation between int and unsigned int. I would than say that
Comparison is different (the sign bit will change how the comparison is performed).
Multiplication is different (I take the modulus, multiply such modules and complement the result based on the sign of both operands).
Division is different (same reason of multiplication).
Addition and subtraction look the same
Are there any other differences that maybe I'm not aware of?
I'm assuming two-complement arithmetic as this is the most common.
There are plenty of explanations of twos complement arithmetic out there. For example, links in the comments, and here for multiplication: http://pages.cs.wisc.edu/~smoler/cs354/beyond354/int.mult.html
1: Correct. Comparison is typically implemented the same as subtraction - but the results of the subtraction are discarded and only the status bits are used. Nit-pick: "<" and ">" are different, but "==" and "!=" are the same.
2, 3: Yes, multiplication and division are different.
4: Well, sort of. The bit-pattern of the result is the same, but there are important differences. The add/sub instructions on a typical processor set status flags for overflow, carry, negative and zero. So the difference I suppose is how you interpret the results rather than the results themselves. These status bits are not available to a C/C++ program, but are used by the code generated by the compiler.
5: Extension. Casting to a wider type is different. For unsigned integrals, they are "zero extended", while for signed integrals they are "sign extended". Sign extension means that it will copy the high order bit (the sign bit) of the narrow type to fill in the additional bits of the wide type.
6: Range: For example, the range of values for an unsigned 8-bit is 0...255, while for a signed 8-bit value it is -128...+127.
7: bit-wise operations "&", "|", "~", and "^" are the same
8: bit-shift operations "<<" and ">>": Left shift is the same, but right-shift is different as signed values right shifted do sign extension.
How Can I represent bitwise AND, OR Operations and Shift Operations using PsuedoCode?
Please Hep me
You said bitwise, so why not & (AND) | (OR) and << (left shift) or >> (right shift)?
Fairly universal C-style syntax.
If you want to represent implementations of bitwise operations, you can introduce "bit enumerator" (some kind of object to iterate over that will emit bits from the less significant one to the most significant for given number) and use mapping for arguments to result (like AND and OR definition suggests).
It's my understanding that in C/C++ bitwise operators are supposed to be endian independent and behave the way you expect. I want to make sure that I'm truly getting the most significant and least significant words out of a 64-bit value and not worry about endianness of the machine. Here's an example:
uint64_t temp;
uint32_t msw, lsw;
msw = (temp & 0xFFFFFFFF00000000) >> 32;
lsw = temp & 0x00000000FFFFFFFF;
Will this work?
6.5.7 Bitwise shift operators
4 The result of E1 << E2 is E1
left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has
an unsigned type, the value of the
result is E1 × 2E2, reduced modulo one
more than the maximum value
representable in the result type. If
E1 has a signed type and nonnegative
value, and E1 × 2E2 is representable
in the result type, then that is the
resulting value; otherwise, the
behavior is undefined.
So, yes -- guranteed by the standard.
It will work, but the strange propensity of some authors for doing bit-masking before bit-shifting always puzzled me.
In my opinion, a much more elegant approach would be the one that does the shift first
msw = (temp >> 32) & 0xFFFFFFFF;
lsw = temp & 0xFFFFFFFF;
at least because it uses the same "magic" bit-mask constant every time.
Now, if your target type is unsigned and already has the desired bit-width, masking becomes completely unnecesary
msw = temp >> 32;
lsw = temp;
Yes, that should work. When you're retrieving the msw, your mask isn't really accomplishing much though -- the bits you mask to zero will be discarded when you do the shift anyway. Personally, I'd probably use something like this:
uint32_t lsw = -1, msw = -1;
lsw &= temp;
msw &= temp >> 32;
Of course, to produce a meaningful result, temp has to be initialized, which it wasn't in your code.
Yes.
It should work.
Just a thought I would like to share, perhaps you could get around the endianess of a value by using the functions or macros found in <arpa/inet.h>, to convert the Network to Host order and vice versa, it may be said that it is more used in conjunction to sockets, but it could be used for this instance to guarantee that a value such as 0xABCD from another processor is still 0xABCD on the Intel x86, instead of resorting to hand-coded custom functions to deal with the endian architecture....?!
Edit: Here's an article about Endianess on CodeProject and the author developed macros to deal with 64-bit values.
Hope this helps,
Best regards,
Tom.
Endianness is about memory layout. Shifting is about bits (and bit layout). Word significance is about bit layout, not memory layout. So endianness has nothing to do with word significance.
I think what you are saying is quite true, but where does this get you?
If you have some literal values hanging around, then you know which end is which. But if you find yourself with values that have come from outside the program, then you can't be sure, unless they have been encoded in some way.
In addition to the other responses, I shall add that you should not worry about endianness in C. Endianness trouble comes only from looking at some bytes under a different type than what was used to write those bytes in the first place. When you do that, you are very close to have aliasing issues, which means that your code may break when using another compiler or another optimization flag.
As long as you do not try to do such trans-type accesses, your code should be endian-neutral, and run flawlessly on both little-endian and big-endian architectures. Or, in other words, if you have endianness issues, then other kinds of bigger trouble are also lurking nearby.