Replace right shift by multiplication - c++

I know that it is possible to use the left shift to implement multiplication by the power of two (x << 4 = x * 16).
Also, it is trivial to replace the right shift by division by a power of two (x >> 5 = x / 32).
I am wondering is it possible to replace the right shift with multiplication?
It seems to be not possible in the general case, but my question is limited to modulo 2^32 and 2^64 arithmetic (unsigned 32-bit and 64-bit values). Also, maybe it can be done if we can add other cheap instructions like + and - in addition to * to emulate the right bit shift?
I assume exotic architecture where the right shift is more expensive than other arithmetic (similar to division).
uint64_t foo(uint64_t x) {
return x >> 3; // how to avoid using right shift here?
}
There is a similar question How to perform right shifting binary multiplication? that asks how to replace multiplication of two unsigned numbers by right shift. Basically, it uses a loop internally. However, maybe if the second number is a constant, this loop can be avoided (or at least unrolled to a shorter fragment)?

"Multiply-high" aka high-mul, hmul, mulh, etc, can be used to emulate a shift-right with a constant count. Usually that's not a good trade. It's also hardly related to C++.
Normal multiplication (putting floating point stuff aside) cannot be used to implement a shift-right.
my question is limited to modulo 2^32 and 2^64 arithmetic
It doesn't help. You can use that property to "unmultiply" (sort of like divide, except not really) by odd numbers, for example if b = 5 * a then a = b * 0xCCCCCCCD, using the modular multiplicative inverse. The number being inverted must be relatively-prime relative to the modulus. Since the modulus is a power of two, the "divisor" here cannot be a power of two (except 1, but that does nothing), so a shift-right cannot be done this way.
Another way to look at it (probably simpler), is that what a multiplication does is conditionally add together a bunch of left-shifted versions of the multiplicand. Only left-shift versions, not right-shifted versions. Which of those shifted versions are selected by the multiplier doesn't matter, there are no right-shifted versions to select.

Related

What's the best multiplication algorithm for fixed point where precision is necessary

I know, I know, people are probably going to say "just switch to floating point", but currently that is not an option due to the nature of the project that I am working on. I am helping write a programming language in C++ and I am currently having difficulty trying to get a very accurate algorithm for multiplication whereby I have a VM and mainly the operations for mod/smod, div/sdiv (ie signed numbers are not a concern here), mul, a halving number for fully fractional numbers and a pushed shift number that I multiply and divide by to create my shifting. For simplicity, lets say I'm working with a 32 byte space. My algorithms work fine for pretty much anything involving integers, it's just that when my fractional portion gets over 16 bytes that I run into problems with precision, and if I were to round it, the number would be fairly accurate, but I want it as accurate as possible, even willing to sacrifice a tad in performance for it, so long as it stays a fixed point and doesn't go into floating point land. The algorithms I'm concerned with I will map out in a sort of pseudocode. Would love any insight into how I could make this better, or any reasoning as to why by the laws of computational science, what I'm asking for is a fruitless endeavor.
For fully fractional numbers (all bytes are fractional):
A = num1 / halfShift //truncates the number down to 16 so that when multiplied, we get a full 32 byte num
B = num2 / halfShift
finalNum = A * B
For the rest of my numbers that are larger than 16 bytes I use this algorithm:
this algorithm can essentially be broken down into the int.frac form
essentially A.B * C.D taking the mathematic form of
D*B/shift + C*A*shift + D*A + C*B
if the fractional numbers are larger than the integer, I halve them, then multiply them together in my D*B/shift
just like in the fully fractional example above
Is there some kind of "magic" rounding method that I should be aware of? Please let me know.
You get the most accurate result if you do the multiplication first and scale afterwards. Of course that means, that you need to store the result of the multiplication in a 64-bit int type.
If that is not an option, your approach with shifting in advance makes sense. But you certainly loose precision.
Either way, you can increase accuracy a little if you round instead of truncate.
I support Aconcagua's recommendation to round to nearest.
For that you need to add the highest bit which is going to be truncated before you apply the division.
In your case that would look like this:
A = (num1 + 1<<(halfshift-1)) >> halfshift
B = (num2 + 1<<(halfshift-1)) >> halfshift
finalNum = A * B
EDIT:
Example on how to dynamically scale the factors and the result depending on the values of the factors (this improves resolution and therefore the accuracy of the result):
shiftA and shiftB need to be set such that A and B are 16 byte fractionals each and therefore the 32 byte result cannot overflow. If shiftA and shiftB is not known in advance, it can be determined by counting the leading zeros of num1 and num2.
A = (num1 + 1<<(shiftA-1)) >> shiftA
B = (num2 + 1<<(shiftB-1)) >> shiftB
finalNum = (A * B) >> (fullshift - (shiftA + shiftB))
The number of fractional digits of a product equals the sum of the numbers of fractional digits in the operands. You have to carry out the multiplication to that precision and then round or truncate according to the desired target precision.

How to fix the position of binary point in an unsigned N-bit interger?

I am working on developing a fixed point algorithm in C++. I know that, for a N-bit integer, the fixed point binary integer is represented as U(a,b). For example, for an 8 bit Integer (i.e 256 samples), If we represent it in the form U(6,2), it means that the binary point is to the left of the 2nd bit starting from the right of the form:
b5 b4 b3 b2 b1 b0 . b(-1) b(-2)
Thus , it has 6 integer bits and 2 fractional bits. In C++, I know there are some bit shift operators I can use, but they are basically used for shifting the bits of the input stream, my question is, how to define a binary fixed point integer of the form, fix<6,2> or U(6,2). All the major processing operation will be carried out on the fractional part and I am just finding a way to do this fix in C++. Any help regarding this would be appreciated.Thanks!
Example : Suppose I have an input discrete signal with 1024 sample points on x-axis (For now just think this input signal is coming from some sensor). Each of this sample point has a particular amplitude. Say the sample at time 2(x-axis) has an amplitude of 3.67(y-axis). Now I have a variable "int *input;" that takes the sample 2, which in binary is 0000 0100. So basically I want to make this as 00000.100 by performing the U(5,3) on the sample 2 in C++. So that I can perform the interpolation operations on fractions of the input sampling period or time.
PS - I don't want to create a separate class or use external libraries for this. I just want to take each 8 bits from my input signal, perform the U(a,b) fix on it followed by rest of the operations are done on the fractional part.
Short answer: left shift.
Long answer:
Fixed point numbers are stored as integers, usually int, which is the fastest integer type for a particular platform.
Normal integer without fractional bits are usually called Q0, Q.0 or QX.0 where X is the total number of bits of underlying storage type(usually int).
To convert between different Q.X formats, left or right shift. For example, to convert 5 in Q0 to 5 in Q4, left shift it 4 bits, or multiply it by 16.
Usually it's useful to find or write a small fixed point library that does basic calculations, like a*b>>q and (a<<q)/b. Because you will do Q.X=Q.Y*Q.Z and Q.X=Q.Y/Q.Z a lot and you need to convert formats when doing calculations. As you may have observed, using normal * operator will give you Q.(X+Y)=Q.X*Q.Y, so in order to fit the result into Q.Z format, you need to right shift the result by (X+Y-Z) bits.
Division is similar, you get Q.(X-Y)=Q.X*Q.Y form the standard / operator, and to get the result in Q.Z format you shift the dividend before the division. What's different is that division is an expensive operation, and it's not trivial to write a fast one from scratch.
Be aware of double-word support of your platform, it will make your life a lot easier. With double word arithmetic, result of a*b can be twice the size of a or b, so that you don't lose range by doing a*b>>c. Without double word, you have to limit the input range of a and b so that a*b doesn't overflow. This is not obvious when you first start, but soon you will find you need more fractional bits or rage to get the job done, and you will finally need to dig into the reference manual of your processor's ISA.
example:
float a = 0.1;// 0.1
int aQ16 = a*65536;// 0.1 in Q16 format
int bQ16 = 4<<16// 4Q16
int cQ16 = a*b>>16 // result = 0.399963378906250Q16 = 26212,
// not 0.4Q16 = 26214 because of truncating error
If this is your question:
Q. Should I define my fixed-binary-point integer as a template, U<int a, int b>(int number), or not, U(int a, int b)
I think your answer to that is: "Do you want to define operators that take two fixed-binary-point integers? If so make them a template."
The template is just a little extra complexity if you're not defining operators. So I'd leave it out.
But if you are defining operators, you don't want to be able to add U<4, 4> and U<6, 2>. What would you define your result as? The templates will give you a compile time error should you try to do that.

Which is better option to use for dividing an integer number by 2?

Which of the following techniques is the best option for dividing an integer by 2 and why?
Technique 1:
x = x >> 1;
Technique 2:
x = x / 2;
Here x is an integer.
Use the operation that best describes what you are trying to do.
If you are treating the number as a sequence of bits, use bitshift.
If you are treating it as a numerical value, use division.
Note that they are not exactly equivalent. They can give different results for negative integers. For example:
-5 / 2 = -2
-5 >> 1 = -3
(ideone)
Does the first one look like dividing? No. If you want to divide, use x / 2. Compiler can optimise it to use bit-shift if possible (it's called strength reduction), which makes it a useless micro-optimisation if you do it on your own.
To pile on: there are so many reasons to favor using x = x / 2; Here are some:
it expresses your intent more clearly (assuming you're not dealing with bit twiddling register bits or something)
the compiler will reduce this to a shift operation anyway
even if the compiler didn't reduce it and chose a slower operation than the shift, the likelihood that this ends up affecting your program's performance in a measurable way is itself vanishingly small (and if it does affect it measurably, then you have an actual reason to use a shift)
if the division is going to be part of a larger expression, you're more likely to get the precedence right if you use the division operator:
x = x / 2 + 5;
x = x >> 1 + 5; // not the same as above
signed arithmetic might complicate things even more than the precedence problem mentioned above
to reiterate - the compiler will already do this for you anyway. In fact, it'll convert division by a constant to a series of shifts, adds, and multiplies for all sorts of numbers, not just powers of two. See this question for links to even more information about this.
In short, you buy nothing by coding a shift when you really mean to multiply or divide, except maybe an increased possibility of introducing a bug. It's been a lifetime since compilers weren't smart enough to optimize this kind of thing to a shift when appropriate.
Which one is the best option and why for dividing the integer number by 2?
Depends on what you mean by best.
If you want your colleagues to hate you, or to make your code hard to read, I'd definitely go with the first option.
If you want to divide a number by 2, go with the second one.
The two are not equivalent, they don't behave the same if the number is negative or inside larger expressions - bitshift has lower precedence than + or -, division has higher precedence.
You should write your code to express what its intent is. If performance is your concern, don't worry, the optimizer does a good job at these sort of micro-optimizations.
Just use divide (/), presuming it is clearer. The compiler will optimize accordingly.
I agree with other answers that you should favor x / 2 because its intent is clearer, and the compiler should optimize it for you.
However, another reason for preferring x / 2 over x >> 1 is that the behavior of >> is implementation-dependent if x is a signed int and is negative.
From section 6.5.7, bullet 5 of the ISO C99 standard:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
x / 2 is clearer, and x >> 1 is not much faster (according to a micro-benchmark, about 30% faster for a Java JVM). As others have noted, for negative numbers the rounding is slightly different, so you have to consider this when you want to process negative numbers. Some compilers may automatically convert x / 2 to x >> 1 if they know the number can not be negative (even thought I could not verify this).
Even x / 2 may not use the (slow) division CPU instruction, because some shortcuts are possible, but it is still slower than x >> 1.
(This is a C / C++ question, other programming languages have more operators. For Java there is also the unsigned right shift, x >>> 1, which is again different. It allows to correctly calculate the mean (average) value of two values, so that (a + b) >>> 1 will return the mean value even for very large values of a and b. This is required for example for binary search if the array indices can get very large. There was a bug in many versions of binary search, because they used (a + b) / 2 to calculate the average. This doesn't work correctly. The correct solution is to use (a + b) >>> 1 instead.)
Knuth said:
Premature optimization is the root of all evil.
So I suggest to use x /= 2;
This way the code is easy to understand and also I think that the optimization of this operation in that form, don't mean a big difference for the processor.
Take a look at the compiler output to help you decide. I ran this test on x86-64 with
gcc (GCC) 4.2.1 20070719 [FreeBSD]
Also see compiler outputs online at godbolt.
What you see is the compiler does use a sarl (arithmetic right-shift) instruction in both cases, so it does recognize the similarity between the two expressions. If you use the divide, the compiler also needs to adjust for negative numbers. To do that it shifts the sign bit down to the lowest order bit, and adds that to the result. This fixes the off-by-one issue when shifting negative numbers, compared to what a divide would do.
Since the divide case does 2 shifts, while the explicit shift case only does one, we can now explain some of the performance differences measured by other answers here.
C code with assembly output:
For divide, your input would be
int div2signed(int a) {
return a / 2;
}
and this compiles to
movl %edi, %eax
shrl $31, %eax # (unsigned)x >> 31
addl %edi, %eax # tmp = x + (x<0)
sarl %eax # (x + 0 or 1) >> 1 arithmetic right shift
ret
similarly for shift
int shr2signed(int a) {
return a >> 1;
}
with output:
sarl %edi
movl %edi, %eax
ret
Other ISAs can do this about as efficiently, if not moreso. For example GCC for AArch64 uses:
add w0, w0, w0, lsr 31 // x += (unsigned)x>>31
asr w0, w0, 1 // x >>= 1
ret
Just an added note -
x *= 0.5 will often be faster in some VM-based languages -- notably actionscript, as the variable won't have to be checked for divide by 0.
Use x = x / 2; OR x /= 2; Because it is possible that a new programmer works on it in future. So it will be easier for him to find out what is going on in the line of code. Everyone may not be aware of such optimizations.
I am telling for the purpose of programming competitions. Generally they have very large inputs where division by 2 takes place many times and its known that input is positive or negative.
x>>1 will be better than x/2. I checked on ideone.com by running a program where more than 10^10 division by 2 operations took place. x/2 took nearly 5.5s whereas x>>1 took nearly 2.6s for same program.
I would say there are several things to consider.
Bitshift should be faster, as no special computation is really
needed to shift the bits, however as pointed out, there are
potential issues with negative numbers. If you are ensured to have
positive numbers, and are looking for speed then I would recommend
bitshift.
The division operator is very easy for humans to read.
So if you are looking for code readability, you could use this. Note
that the field of compiler optimization has come a long way, so making code easy
to read and understand is good practice.
Depending on the underlying hardware,
operations may have different speeds. Amdal's law is to make the
common case fast. So you may have hardware that can perform
different operations faster than others. For example, multiplying by
0.5 may be faster than dividing by 2. (Granted you may need to take the floor of the multiplication if you wish to enforce integer division).
If you are after pure performance, I would recommend creating some tests that could do the operations millions of times. Sample the execution several times (your sample size) to determine which one is statistically best with your OS/Hardware/Compiler/Code.
As far as the CPU is concerned, bit-shift operations are faster than division operations.
However, the compiler knows this and will optimize appropriately to the extent that it can,
so you can code in the way that makes the most sense and rest easy knowing that your code is
running efficiently. But remember that an unsigned int can (in some cases) be optimized better than an int for reasons previously pointed out.
If you don't need signed arithmatic, then don't include the sign bit.
x = x / 2; is the suitable code to use.. but an operation depend on your own program of how the output you wanted to produce.
Make your intentions clearer...for example, if you want to divide, use x / 2, and let the compiler optimize it to shift operator (or anything else).
Today's processors won't let these optimizations have any impact on the performance of your programs.
The answer to this will depend on the environment you're working under.
If you're working on an 8-bit microcontroller or anything without hardware support for multiplication, bit shifting is expected and commonplace, and while the compiler will almost certainly turn x /= 2 into x >>= 1, the presence of a division symbol will raise more eyebrows in that environment than using a shift to effect a division.
If you're working in a performance-critical environment or section of code, or your code could be compiled with compiler optimization off, x >>= 1 with a comment explaining its reasoning is probably best just for clarity of purpose.
If you're not under one of the above conditions, make your code more readable by simply using x /= 2. Better to save the next programmer who happens to look at your code the 10 second double-take on your shift operation than to needlessly prove you knew the shift was more efficient sans compiler optimization.
All these assume unsigned integers. The simple shift is probably not what you want for signed. Also, DanielH brings up a good point about using x *= 0.5 for certain languages like ActionScript.
mod 2, test for = 1. dunno the syntax in c. but this may be fastest.
generaly the right shift divides :
q = i >> n; is the same as: q = i / 2**n;
this is sometimes used to speed up programs at the cost of clarity. I don't think you should do it . The compiler is smart enough to perform the speedup automatically. This means that putting in a shift gains you nothing at the expense of clarity.
Take a look at this page from Practical C++ Programming.
Obviously, if you are writing your code for the next guy who reads it, go for the clarity of "x/2".
However, if speed is your goal, try it both ways and time the results. A few months ago I worked on a bitmap convolution routine which involved stepping through an array of integers and dividing each element by 2. I did all kinds of things to optimize it including the old trick of substituting "x>>1" for "x/2".
When I actually timed both ways I discovered to my surprise that x/2 was faster than x>>1
This was using Microsoft VS2008 C++ with the default optimizations turned on.
In terms of performance. CPU's shift operations are significantly faster than divide op-codes.
So dividing by two or multiplying by 2 etc all benefit from shift operations.
As to the look and feel. As engineers when did we become so attached to cosmetics that even beautiful ladies don't use! :)
X/Y is a correct one...and " >> " shifting operator..if we want two divide a integer we can use (/) dividend operator. shift operator is used to shift the bits..
x=x/2;
x/=2; we can use like this..

Avoiding overflow in integer multiplication followed by division

I have two integral variables a and b and a constant s resp. d. I need to calculate the value of (a*b)>>s resp. a*b/d. The problem is that the multiplication may overflow and the final result will not be correct even though a*b/d could fit in the given integral type.
How could that be solved efficiently? The straightforward solution is to expand the variable a or b to a larger integral type, but there may not be a larger integral type. Is there any better way to solve the problem?
If there isn't a larger type, you will either need to find a big-int style library, or deal with it manually, using long multiplication.
For instance, assume a and b are 16-bit. Then you can rewrite them as a = (1<<8)*aH + aL, and b = (1<<8)*bH + bL (where all the individual components are 8-bit numbers). Then you know that the overall result will be:
(a*b) = (1<<16)*aH*bH
+ (1<<8)*aH*bL
+ (1<<8)*aL*bH
+ aL*bL
Each of these 4 components will fit a 16-bit register. You can now perform e.g. right-shifts on each of the individual components, being careful to deal with carries appropriately.
I haven't exhaustively tested this, but could you do the division first, then account for the remainder, at the expense of extra operations? Since d is a power of two, all the divisions can be reduced to bitwise operations.
For example, always assume a > b (you want to divide the larger number first). Then a * b / d = ((a / d) * b) + (((a % d) * b) / d)
If the larger type is just 64 bits then the straight forward solution will most likely result in efficient code. On x86 CPUs any multiplication of two 32 bit numbers will give the overflow in another register. So if your compiler understands that, it can generate efficient code for Int64 result=(Int64)a*(Int64)b.
I had the same problem in C#, and the compiler generated pretty good code. And C++ compilers typically create better code than the .net JIT.
I recommend writing the code with the casts to the larger types and then inspect the generated assembly code to check if it's good.
In certain cases (historically LCG random number generators with selected constants), it is possible to do what you want, for some values of a and d.
This is called Schrage's method, see eg. there.

Are there any good reasons to use bit shifting except for quick math?

I understand bitwise operations and how they might be useful for different purposes, e.g. permissions. However, I don't seem to understand what use the bit shift operators are. I understand how they work, but I can't think of any scenarios where I might want to use them unless I want to do some really quick multiplication or division. Are there any other reasons to use bit-shifting?
There are many reasons, here are some:
Let's say you represent a black and white image as a sequence of bits and you want to set a single pixel in this image generically. For example your byte offset may be x>>3 and your bit offset may be x & 0x7 and you can set that bit by: byte = byte | (1 << (x & 0x7));
Implementing data compression algorithms where you deal with variable length bit sequences, e.g. huffman coding.
You're are interacting with some hardware, e.g. a serial communication device, and you need to read or set some control bits.
For those and other reasons most processors have bit shift and/or rotation instructions as well as other logic instructions (and/or/xor/not).
Historically multiplication and division were significantly slower as they are more complex operations and some CPUs didn't have those at all.
Also see here:
Have you ever had to use bit shifting in real projects?
As you indicate, a left shift is the same thing as a multiplication by two. At least it is when we're talking about unsigned quantities. The meaning of a "left shift" of a signed quantity is ... language dependent.
With modern compilers, there's really no difference between writing "i = x*2;" and "i = x << 1;" The compiler will generate the most efficient code. So in that sense there's no reason to prefer shift over multiply.
Some algorithms work by shifting a quantity left by one bit and then setting the low bit to either 0 or 1. Some simple compression algorithms work this way. For example, if your accumulated value is in the variable x, and the current value (0 or 1) is in y, then it makes more sense to write "x = (x << 1) | y", rather than "x = (x * 2) + y". Both do the same thing, but the first is more notationally correct. You don't have to think, "oh, right, multiply by two is the same as a left shift."
Also, when you're talking about algorithms that shift bits, it's more convenient to shift left or right by a particular number of bits than to figure out what multiple of 2 you want to multiply or divide by.
So, whereas there's typically no performance benefit to shifting rather than multiplying--at least not when working with high level languages--there are times when having the ability to shift makes what you're doing more easily understood.
There are lot of places where bit shift operations are regularly used outside of their usage in numerical computations. For example, Bitboard is a data structure that is commonly used in board games for board representation. Some of the strongest chess engines use this data structure mainly for speed and ease of move generation and evaluation. These programs use bit operations heavily and bit-shift operations specifically are used in a lot of contexts - such as finding bit masks, generating new moves on the board, computing logarithm very quickly, etc. There are even very advanced numerical computations that can be done elegantly by clever use of bit operations. Check out this site for bit twiddling hacks - a lot of those algorithms use shift operators. Bit shift operations are regularly used in device driver programming, codec development, embedded systems programming and so on.
Shifting allows accessing specific bits within a variable. The expression (n >> p) & ((1 << m) - 1) retrieves an m-bit portion of the variable n with an offset of p bits from the right.
This allows your program to use integers that aren't multiples of 8 bits, which is useful for data compression.
For example, I used it in my Netflix Prize programs to pack records (22-bit user ID + 15-bit movie ID + 12-bit date + 3-bit rating) into a uint64_t (with 12 bits to spare).
A very common special case is to pack 8 bool variables into each byte. (Unix file permissions, black-and-white bitmaps, CPU flags registers, etc.)
Also, bit manipulation is used in UTF-8, which is a very popular character encoding. Unicode characters are represented by distributing their bits across 1, 2, 3, or 4 bytes.