Is there any performance gain/loss by using unsigned integers over signed integers?
If so, does this goes for short and long as well?
Division by powers of 2 is faster with unsigned int, because it can be optimized into a single shift instruction. With signed int, it usually requires more machine instructions, because division rounds towards zero, but shifting to the right rounds down. Example:
int foo(int x, unsigned y)
{
x /= 8;
y /= 8;
return x + y;
}
Here is the relevant x part (signed division):
movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl $3, %eax
And here is the relevant y part (unsigned division):
movl 12(%ebp), %edx
shrl $3, %edx
In C++ (and C), signed integer overflow is undefined, whereas unsigned integer overflow is defined to wrap around. Notice that e.g. in gcc, you can use the -fwrapv flag to make signed overflow defined (to wrap around).
Undefined signed integer overflow allows the compiler to assume that overflows don't happen, which may introduce optimization opportunities. See e.g. this blog post for discussion.
unsigned leads to the same or better performance than signed.
Some examples:
Division by a constant which is a power of 2 (see also the answer from FredOverflow)
Division by a constant number (for example, my compiler implements division by 13 using 2 asm instructions for unsigned, and 6 instructions for signed)
Checking whether a number is even (i have no idea why my MS Visual Studio compiler implements it with 4 instructions for signed numbers; gcc does it with 1 instruction, just like in the unsigned case)
short usually leads to the same or worse performance than int (assuming sizeof(short) < sizeof(int)). Performance degradation happens when you assign a result of an arithmetic operation (which is usually int, never short) to a variable of type short, which is stored in the processor's register (which is also of type int). All the conversions from short to int take time and are annoying.
Note: some DSPs have fast multiplication instructions for the signed short type; in this specific case short is faster than int.
As for the difference between int and long, i can only guess (i am not familiar with 64-bit architectures). Of course, if int and long have the same size (on 32-bit platforms), their performance is also the same.
A very important addition, pointed out by several people:
What really matters for most applications is the memory footprint and utilized bandwidth. You should use the smallest necessary integers (short, maybe even signed/unsigned char) for large arrays.
This will give better performance, but the gain is nonlinear (i.e. not by a factor of 2 or 4) and somewhat unpredictable - it depends on cache size and the relationship between calculations and memory transfers in your application.
This will depend on exact implementation. In most cases there will be no difference however. If you really care you have to try all the variants you consider and measure performance.
This is pretty much dependent on the specific processor.
On most processors, there are instructions for both signed and unsigned arithmetic, so the difference between using signed and unsigned integers comes down to which one the compiler uses.
If any of the two is faster, it's completely processor specific, and most likely the difference is miniscule, if it exists at all.
The performance difference between signed and unsigned integers is actually more general than the acceptance answer suggests. Division of an unsigned integer by any constant can be made faster than division of a signed integer by a constant, regardless of whether the constant is a power of two. See http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html
At the end of his post, he includes the following section:
A natural question is whether the same optimization could improve signed division; unfortunately it appears that it does not, for two reasons:
The increment of the dividend must become an increase in the magnitude, i.e. increment if n > 0, decrement if n < 0. This introduces an additional expense.
The penalty for an uncooperative divisor is only about half as much in signed division, leaving a smaller window for improvements.
Thus it appears that the round-down algorithm could be made to work in signed division, but will underperform the standard round-up algorithm.
Not only division by powers of 2 are faster with unsigned type, division by any other values are also faster with unsigned type. If you look at Agner Fog's Instruction tables you'll see that unsigned divisions have similar or better performance than signed versions
For example with the AMD K7
Instruction
Operands
Ops
Latency
Reciprocal throughput
DIV
r8/m8
32
24
23
DIV
r16/m16
47
24
23
DIV
r32/m32
79
40
40
IDIV
r8
41
17
17
IDIV
r16
56
25
25
IDIV
r32
88
41
41
IDIV
m8
42
17
17
IDIV
m16
57
25
25
IDIV
m32
89
41
41
The same thing applies to Intel Pentium
Instruction
Operands
Clock cycles
DIV
r8/m8
17
DIV
r16/m16
25
DIV
r32/m32
41
IDIV
r8/m8
22
IDIV
r16/m16
30
IDIV
r32/m32
46
Of course those are quite ancient. Newer architectures with more transistors might close the gap but the basic things apply: generally you need more micro ops, more logic, more latency to do a signed division
In short, don't bother before the fact. But do bother after.
If you want to have performance you have to use performance optimizations of a compiler which may work against common sense. One thing to remember is that different compilers can compile code differently and they themselves have different sorts of optimizations. If we're talking about a g++ compiler and talking about maxing out it's optimization level by using -Ofast, or at least an -O3 flag, in my experience it can compile long type into code with even better performance than any unsigned type, or even just int.
This is from my own experience and I recommend you to first write your full program and care about such things only after that, when you have your actual code on your hands and you can compile it with optimizations to try and pick the types that actually perform best. This is also a good very general suggestion about code optimization for performance, write quickly first, try compiling with optimizations, tweak things to see what works best. And you should also try using different compilers to compile your program and choosing the one that outputs the most performant machine code.
An optimized multi-threaded linear algebra calculation program can easily have a >10x performance difference finely optimized vs unoptimized. So this does matter.
Optimizer output contradicts logic in plenty of cases. For example, I had a case when a difference between a[x]+=b and a[x]=b changed program execution time almost 2x. And no, a[x]=b wasn't the faster one.
Here's for example NVidia stating that for programming their GPUs:
Note: As was already the recommended best practice, signed arithmetic
should be preferred over unsigned arithmetic wherever possible for
best throughput on SMM. The C language standard places more
restrictions on overflow behavior for unsigned math, limiting compiler
optimization opportunities.
Traditionally int is the native integer format of the target hardware platform. Any other integer type may incur performance penalties.
EDIT:
Things are slightly different on modern systems:
int may in fact be 32-bit on 64-bit systems for compatibility reasons. I believe this happens on Windows systems.
Modern compilers may implicitly use int when performing computations for shorter types in some cases.
IIRC, on x86 signed/unsigned shouldn't make any difference. Short/long, on the other hand, is a different story, since the amount of data that has to be moved to/from RAM is bigger for longs (other reasons may include cast operations like extending a short to long).
Signed and unsigned integers will always both operate as single clock instructions and have the same read-write performance but according to Dr Andrei Alexandrescu unsigned is preferred over signed. The reason for this is you can fit twice the amount of numbers in the same number of bits because you're not wasting the sign bit and you will use fewer instructions checking for negative numbers yielding performance increases from the decreased ROM. In my experience with the Kabuki VM, which features an ultra-high-performance Script Implementation, it is rare that you actually require a signed number when working with memory. I've spend may years doing pointer arithmetic with signed and unsigned numbers and I've found no benefit to the signed when no sign bit is needed.
Where signed may be preferred is when using bit shifting to perform multiplication and division of powers of 2 because you may perform negative powers of 2 division with signed 2's complement integers. Please see some more YouTube videos from Andrei for more optimization techniques. You can also find some good info in my article about the the world's fastest Integer-to-String conversion algorithm.
Unsigned integer is advantageous in that you store and treat both as bitstream, I mean just a data, without sign, so multiplication, devision becomes easier (faster) with bit-shift operations
Related
Check out this simple code:
#include <cmath>
float foo(float in) {
return sqrtf(in);
}
With -ffast-math, clang generates sqrtss, as it is expected. But, if I use -fstack-protector-all as well, it changes sqrtss to rsqrtss, as you can see at godbolt. Why?
The short and sweet:
rsqrtss is safer and, as a result, less accurate and slower.
sqrtss is faster and, as a result, less safe.
Why is rsqrtss safer?
It doesn't use the whole XMM register.
Why is rsqrtss slower?
Because it needs more registers to perform the same action as
sqrtss.
Why does rsqrtss use a reciprocal?
In a pinch, it seems that the reciprocal of a square root can be calculated faster and with less memory.
Pico-spelenda: Lots of math.
The long and bitter:
Research
What does -ffast-math do?
-ffast-math
Enable fast-math mode. This defines the __FAST_MATH__ preprocessor
macro, and lets the compiler make aggressive, potentially-lossy
assumptions about floating-point math. These include:
Floating-point math obeys regular algebraic rules for real numbers (e.g. + and * are associative, x/y == x * (1/y), and (a + b) * c == a * c + b * c),
operands to floating-point operations are not equal to NaN and Inf, and
+0 and -0 are interchangeable.
What does -fstack-protector-all do?
This answer can be found here.
Basically, it "forces the usage of stack protectors for all functions".
What is a "stack protector"?
A nice article for you.
The blissfully short, quite terribly succient sparknotes is:
A "stack protector" is used to prevent exploitation of stack overwrites.
the stack protector as implemented in gcc and clang adds an additional guard
variable to each function’s stack area.
Interesting Drawback To Note:
"Adding these checks will lead to a little runtime overhead: More stack
space is needed, but that is negligible except for really constrained
systems...Do you aim for maximum security at the cost of
performance? -fstack-protector-all is for you."
What is sqrtss?
According to #godbolt:
Computes the square root of the low single-precision floating-point value
in the second source operand and stores the single-precision floating-point
result in the destination operand. The second source operand can be an XMM
register or a 32-bit memory location. The first source and destination
operands is an XMM register.
What is a "source operand"?
A tutorial can be found here
In essence, an operand is a location of data in a computer. Imagine the simple instruction of x+x=y.You need to know what 'x' is, which is the source operand. And where the result will be stored, 'y', which is the destination operand. Notice how the '+' symbol, which is commonly called an 'operation' can be forgotten, because it doesn't matter in this example.
What is an "XMM register"?
An explanation can be found here.
It's just a specific type of register. It's primarily used in floating math
( which, surpisingly enough, is the math you are trying to do ).
What is rsqrtss?
Again, according to #godbolt:
Computes an approximate reciprocal of the square root of the low
single-precision floating-point value in the source operand (second operand)
stores the single-precision floating-point result in the destination operand.
The source operand can be an XMM register or a 32-bit memory location. The
destination operand is an XMM register. The three high-order doublewords of
the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration
of a scalar single-precision floating-point operation.
What is a "doubleword"?
A simple definition.
It is a unit of measurement of computer memory, just like 'bit' or 'byte'. However, unlike 'bit' or 'byte', it is not universal and depends on the architectures of the computer.
What does "Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1" look like?
Here you go.
Disclaimer:
Most of this knowlegde comes from outside sources. I literally install clang just now to help answer your question. I'm not an expert.
From the answers I got from this question, it appears that C++ inherited this requirement for conversion of short into int when performing arithmetic operations from C. May I pick your brains as to why this was introduced in C in the first place? Why not just do these operations as short?
For example (taken from dyp's suggestion in the comments):
short s = 1, t = 2 ;
auto x = s + t ;
x will have type of int.
If we look at the Rationale for International Standard—Programming Languages—C in section 6.3.1.8 Usual arithmetic conversions it says (emphasis mine going forward):
The rules in the Standard for these conversions are slight
modifications of those in K&R: the modifications accommodate the added
types and the value preserving rules. Explicit license was added to
perform calculations in a “wider” type than absolutely necessary,
since this can sometimes produce smaller and faster code, not to
mention the correct answer more often. Calculations can also be
performed in a “narrower” type by the as if rule so long as the same
end result is obtained. Explicit casting can always be used to obtain
a value in a desired type
Section 6.3.1.8 from the draft C99 standard covers the Usual arithmetic conversions which is applied to operands of arithmetic expressions for example section 6.5.6 Additive operators says:
If both operands have arithmetic type, the usual arithmetic
conversions are performed on them.
We find similar text in section 6.5.5 Multiplicative operators as well. In the case of a short operand, first the integer promotions are applied from section 6.3.1.1 Boolean, characters, and integers which says:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.48) All other types are
unchanged by the integer promotions.
The discussion from section 6.3.1.1 of the Rationale or International Standard—Programming Languages—C on integer promotions is actually more interesting, I am going to selectively quote b/c it is too long to fully quote:
Implementations fell into two major camps which may be characterized
as unsigned preserving and value preserving.
[...]
The unsigned preserving approach calls for promoting the two smaller
unsigned types to unsigned int. This is a simple rule, and yields a
type which is independent of execution environment.
The value preserving approach calls for promoting those types to
signed int if that type can properly represent all the values of the
original type, and otherwise for promoting those types to unsigned
int. Thus, if the execution environment represents short as something
smaller than int, unsigned short becomes int; otherwise it becomes
unsigned int.
This can have some rather unexpected results in some cases as Inconsistent behaviour of implicit conversion between unsigned and bigger signed types demonstrates, there are plenty more examples like that. Although in most cases this results in the operations working as expected.
It's not a feature of the language as much as it is a limitation of physical processor architectures on which the code runs. The int typer in C is usually the size of your standard CPU register. More silicon takes up more space and more power, so in many cases arithmetic can only be done on the "natural size" data types. This is not universally true, but most architectures still have this limitation. In other words, when adding two 8-bit numbers, what actually goes on in the processor is some type of 32-bit arithmetic followed by either a simple bit mask or another appropriate type conversion.
short and char types are considered by the standard sort of "storage types" i.e. sub-ranges that you can use to save some space but that are not going to buy you any speed because their size is "unnatural" for the CPU.
On certain CPUs this is not true but good compilers are smart enough to notice that if you e.g. add a constant to an unsigned char and store the result back in an unsigned char then there's no need to go through the unsigned char -> int conversion.
For example with g++ the code generated for the inner loop of
void incbuf(unsigned char *buf, int size) {
for (int i=0; i<size; i++) {
buf[i] = buf[i] + 1;
}
}
is just
.L3:
addb $1, (%rdi,%rax)
addq $1, %rax
cmpl %eax, %esi
jg .L3
.L1:
where you can see that an unsigned char addition instruction (addb) is used.
The same happens if you're doing your computations between short ints and storing the result in short ints.
The linked question seems to cover it pretty well: the CPU just doesn't. A 32-bit CPU has its native arithmetic operations set up for 32-bit registers. The processor prefers to work in its favorite size, and for operations like this, copying a small value into a native-size register is cheap. (For the x86 architecture, the 32-bit registers are named as if they are extended versions of the 16-bit registers (eax to ax, ebx to bx, etc); see x86 integer instructions).
For some extremely common operations, particularly vector/float arithmetic, there may be specialized instructions that operate on a different register type or size. For something like a short, padding with (up to) 16 bits of zeroes has very little performance cost and adding specialized instructions is probably not worth the time or space on the die (if you want to get really physical about why; I'm not sure they would take actual space, but it does get way more complex).
The C99 standard introduces the following datatypes. The documentation can be found here for the AVR stdint library.
uint8_t means it's an 8-bit unsigned type.
uint_fast8_t means it's the fastest unsigned int with at least 8
bits.
uint_least8_t means it's an unsigned int with at least 8 bits.
I understand uint8_t and what is uint_fast8_t( I don't know how it's implemented in register level).
1.Can you explain what is the meaning of "it's an unsigned int with at least 8 bits"?
2.How uint_fast8_t and uint_least8_t help increase efficiency/code space compared to the uint8_t?
uint_least8_t is the smallest type that has at least 8 bits.
uint_fast8_t is the fastest type that has at least 8 bits.
You can see the differences by imagining exotic architectures. Imagine a 20-bit architecture. Its unsigned int has 20 bits (one register), and its unsigned char has 10 bits. So sizeof(int) == 2, but using char types requires extra instructions to cut the registers in half. Then:
uint8_t: is undefined (no 8 bit type).
uint_least8_t: is unsigned char, the smallest type that is at least 8 bits.
uint_fast8_t: is unsigned int, because in my imaginary architecture, a half-register variable is slower than a full-register one.
uint8_t means: give me an unsigned int of exactly 8 bits.
uint_least8_t means: give me the smallest type of unsigned int which has at least 8 bits. Optimize for memory consumption.
uint_fast8_t means: give me an unsigned int of at least 8 bits. Pick a larger type if it will make my program faster, because of alignment considerations. Optimize for speed.
Also, unlike the plain int types, the signed version of the above stdint.h types are guaranteed to be 2's complement format.
The theory goes something like:
uint8_t is required to be exactly 8 bits but it's not required to exist. So you should use it where you are relying on the modulo-256 assignment behaviour* of an 8 bit integer and where you would prefer a compile failure to misbehaviour on obscure architectures.
uint_least8_t is required to be the smallest available unsigned integer type that can store at least 8 bits. You would use it when you want to minimise the memory use of things like large arrays.
uint_fast8_t is supposed to be the "fastest" unsigned type that can store at least 8 bits; however, it's not actually guaranteed to be the fastest for any given operation on any given processor. You would use it in processing code that performs lots of operations on the value.
The practice is that the "fast" and "least" types aren't used much.
The "least" types are only really useful if you care about portability to obscure architectures with CHAR_BIT != 8 which most people don't.
The problem with the "fast" types is that "fastest" is hard to pin down. A smaller type may mean less load on the memory/cache system but using a type that is smaller than native may require extra instructions. Furthermore which is best may change between architecture versions but implementers often want to avoid breaking ABI in such cases.
From looking at some popular implementations it seems that the definitions of uint_fastn_t are fairly arbitrary. glibc seems to define them as being at least the "native word size" of the system in question taking no account of the fact that many modern processors (especially 64-bit ones) have specific support for fast operations on items smaller than their native word size. IOS apparently defines them as equivalent to the fixed-size types. Other platforms may vary.
All in all if performance of tight code with tiny integers is your goal you should be bench-marking your code on the platforms you care about with different sized types to see what works best.
* Note that unfortunately modulo-256 assignment behaviour does not always imply modulo-256 arithmetic, thanks to C's integer promotion misfeature.
Some processors cannot operate as efficiently on smaller data types as on large ones. For example, given:
uint32_t foo(uint32_t x, uint8_t y)
{
x+=y;
y+=2;
x+=y;
y+=4;
x+=y;
y+=6;
x+=y;
return x;
}
if y were uint32_t a compiler for the ARM Cortex-M3 could simply generate
add r0,r0,r1,asl #2 ; x+=(y<<2)
add r0,r0,#12 ; x+=12
bx lr ; return x
but since y is uint8_t the compiler would have to instead generate:
add r0,r0,r1 ; x+=y
add r1,r1,#2 ; Compute y+2
and r1,r1,#255 ; y=(y+2) & 255
add r0,r0,r1 ; x+=y
add r1,r1,#4 ; Compute y+4
and r1,r1,#255 ; y=(y+4) & 255
add r0,r0,r1 ; x+=y
add r1,r1,#6 ; Compute y+6
and r1,r1,#255 ; y=(y+6) & 255
add r0,r0,r1 ; x+=y
bx lr ; return x
The intended purpose of the "fast" types was to allow compilers to replace smaller types which couldn't be processed efficiently with faster ones. Unfortunately, the semantics of "fast" types are rather poorly specified, which in turn leaves murky questions of whether expressions will be evaluated using signed or unsigned math.
1.Can you explain what is the meaning of "it's an unsigned int with at least 8 bits"?
That ought to be obvious. It means that it's an unsigned integer type, and that it's width is at least 8 bits. In effect this means that it can at least hold the numbers 0 through 255, and it can definitely not hold negative numbers, but it may be able to hold numbers higher than 255.
Obviously you should not use any of these types if you plan to store any number outside the range 0 through 255 (and you want it to be portable).
2.How uint_fast8_t and uint_least8_t help increase efficiency/code space compared to the uint8_t?
uint_fast8_t is required to be faster so you should use that if your requirement is that the code be fast. uint_least8_t on the other hand requires that there is no candidate of lesser size - so you would use that if size is the concern.
And of course you use only uint8_t when you absolutely require it to be exactly 8 bits. Using uint8_t may make the code non-portable as uint8_t is not required to exist (because such small integer type does not exist on certain platforms).
The "fast" integer types are defined to be the fastest integer available with at least the amount of bits required (in your case 8).
A platform can define uint_fast8_t as uint8_t then there will be absolutely no difference in speed.
The reason is that there are platforms that are slower when not using their native word length.
As the name suggests, uint_least8_t is the smallest type that has at least 8 bits, uint_fast8_t is the fastest type that has at least 8 bits. uint8_t has exactly 8 bits, but it is not guaranteed to exist on all platforms, although this is extremely uncommon.
In most case, uint_least8_t = uint_fast8_t = uint8_t = unsigned char. The only exception I have seen is the C2000 DSP from Texas Instruments, it is 32-bit, but its minimum data width is 16-bit. It does not have uint8_t, you can only use uint_least8_t and uint_fast8_t, they are defined as unsigned int, which is 16-bit.
I'm using the fast datatypes (uint_fast8_t) for local vars and function parameters, and using the normal ones (uint8_t) in arrays and structures which are used frequently and memory footprint is more important than the few cycles that could be saved by not having to clear or sign extend the upper bits.
Works great, except with MISRA checkers. They go nuts from the fast types. The trick is that the fast types are used through derived types that can be defined differently for MISRA builds and normal ones.
I think these types are great to create portable code, that's efficient on both low-end microcontrollers and big application processors. The improvement might be not huge, or totally negligible with good compilers, but better than nothing.
Some guessing in this thread.
"fast": The compiler should place "fast" type vars in IRAM (local processor RAM) which requires fewer cycles to access and write than vars stored in the hinterlands of RAM. "fast" is used if you need quickest possible action on a var, such as in an Interrupt Service Routine (ISR). Same as declaring a function to have an IRAM_ATTR; this == faster access. There is limited space for "fast" or IRAM vars/functions, so only use when needed, and never persist unless they qualify for that. Most compilers will move "fast" vars to general RAM if processor RAM is all allocated.
Which of the following techniques is the best option for dividing an integer by 2 and why?
Technique 1:
x = x >> 1;
Technique 2:
x = x / 2;
Here x is an integer.
Use the operation that best describes what you are trying to do.
If you are treating the number as a sequence of bits, use bitshift.
If you are treating it as a numerical value, use division.
Note that they are not exactly equivalent. They can give different results for negative integers. For example:
-5 / 2 = -2
-5 >> 1 = -3
(ideone)
Does the first one look like dividing? No. If you want to divide, use x / 2. Compiler can optimise it to use bit-shift if possible (it's called strength reduction), which makes it a useless micro-optimisation if you do it on your own.
To pile on: there are so many reasons to favor using x = x / 2; Here are some:
it expresses your intent more clearly (assuming you're not dealing with bit twiddling register bits or something)
the compiler will reduce this to a shift operation anyway
even if the compiler didn't reduce it and chose a slower operation than the shift, the likelihood that this ends up affecting your program's performance in a measurable way is itself vanishingly small (and if it does affect it measurably, then you have an actual reason to use a shift)
if the division is going to be part of a larger expression, you're more likely to get the precedence right if you use the division operator:
x = x / 2 + 5;
x = x >> 1 + 5; // not the same as above
signed arithmetic might complicate things even more than the precedence problem mentioned above
to reiterate - the compiler will already do this for you anyway. In fact, it'll convert division by a constant to a series of shifts, adds, and multiplies for all sorts of numbers, not just powers of two. See this question for links to even more information about this.
In short, you buy nothing by coding a shift when you really mean to multiply or divide, except maybe an increased possibility of introducing a bug. It's been a lifetime since compilers weren't smart enough to optimize this kind of thing to a shift when appropriate.
Which one is the best option and why for dividing the integer number by 2?
Depends on what you mean by best.
If you want your colleagues to hate you, or to make your code hard to read, I'd definitely go with the first option.
If you want to divide a number by 2, go with the second one.
The two are not equivalent, they don't behave the same if the number is negative or inside larger expressions - bitshift has lower precedence than + or -, division has higher precedence.
You should write your code to express what its intent is. If performance is your concern, don't worry, the optimizer does a good job at these sort of micro-optimizations.
Just use divide (/), presuming it is clearer. The compiler will optimize accordingly.
I agree with other answers that you should favor x / 2 because its intent is clearer, and the compiler should optimize it for you.
However, another reason for preferring x / 2 over x >> 1 is that the behavior of >> is implementation-dependent if x is a signed int and is negative.
From section 6.5.7, bullet 5 of the ISO C99 standard:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1 /
2E2. If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
x / 2 is clearer, and x >> 1 is not much faster (according to a micro-benchmark, about 30% faster for a Java JVM). As others have noted, for negative numbers the rounding is slightly different, so you have to consider this when you want to process negative numbers. Some compilers may automatically convert x / 2 to x >> 1 if they know the number can not be negative (even thought I could not verify this).
Even x / 2 may not use the (slow) division CPU instruction, because some shortcuts are possible, but it is still slower than x >> 1.
(This is a C / C++ question, other programming languages have more operators. For Java there is also the unsigned right shift, x >>> 1, which is again different. It allows to correctly calculate the mean (average) value of two values, so that (a + b) >>> 1 will return the mean value even for very large values of a and b. This is required for example for binary search if the array indices can get very large. There was a bug in many versions of binary search, because they used (a + b) / 2 to calculate the average. This doesn't work correctly. The correct solution is to use (a + b) >>> 1 instead.)
Knuth said:
Premature optimization is the root of all evil.
So I suggest to use x /= 2;
This way the code is easy to understand and also I think that the optimization of this operation in that form, don't mean a big difference for the processor.
Take a look at the compiler output to help you decide. I ran this test on x86-64 with
gcc (GCC) 4.2.1 20070719 [FreeBSD]
Also see compiler outputs online at godbolt.
What you see is the compiler does use a sarl (arithmetic right-shift) instruction in both cases, so it does recognize the similarity between the two expressions. If you use the divide, the compiler also needs to adjust for negative numbers. To do that it shifts the sign bit down to the lowest order bit, and adds that to the result. This fixes the off-by-one issue when shifting negative numbers, compared to what a divide would do.
Since the divide case does 2 shifts, while the explicit shift case only does one, we can now explain some of the performance differences measured by other answers here.
C code with assembly output:
For divide, your input would be
int div2signed(int a) {
return a / 2;
}
and this compiles to
movl %edi, %eax
shrl $31, %eax # (unsigned)x >> 31
addl %edi, %eax # tmp = x + (x<0)
sarl %eax # (x + 0 or 1) >> 1 arithmetic right shift
ret
similarly for shift
int shr2signed(int a) {
return a >> 1;
}
with output:
sarl %edi
movl %edi, %eax
ret
Other ISAs can do this about as efficiently, if not moreso. For example GCC for AArch64 uses:
add w0, w0, w0, lsr 31 // x += (unsigned)x>>31
asr w0, w0, 1 // x >>= 1
ret
Just an added note -
x *= 0.5 will often be faster in some VM-based languages -- notably actionscript, as the variable won't have to be checked for divide by 0.
Use x = x / 2; OR x /= 2; Because it is possible that a new programmer works on it in future. So it will be easier for him to find out what is going on in the line of code. Everyone may not be aware of such optimizations.
I am telling for the purpose of programming competitions. Generally they have very large inputs where division by 2 takes place many times and its known that input is positive or negative.
x>>1 will be better than x/2. I checked on ideone.com by running a program where more than 10^10 division by 2 operations took place. x/2 took nearly 5.5s whereas x>>1 took nearly 2.6s for same program.
I would say there are several things to consider.
Bitshift should be faster, as no special computation is really
needed to shift the bits, however as pointed out, there are
potential issues with negative numbers. If you are ensured to have
positive numbers, and are looking for speed then I would recommend
bitshift.
The division operator is very easy for humans to read.
So if you are looking for code readability, you could use this. Note
that the field of compiler optimization has come a long way, so making code easy
to read and understand is good practice.
Depending on the underlying hardware,
operations may have different speeds. Amdal's law is to make the
common case fast. So you may have hardware that can perform
different operations faster than others. For example, multiplying by
0.5 may be faster than dividing by 2. (Granted you may need to take the floor of the multiplication if you wish to enforce integer division).
If you are after pure performance, I would recommend creating some tests that could do the operations millions of times. Sample the execution several times (your sample size) to determine which one is statistically best with your OS/Hardware/Compiler/Code.
As far as the CPU is concerned, bit-shift operations are faster than division operations.
However, the compiler knows this and will optimize appropriately to the extent that it can,
so you can code in the way that makes the most sense and rest easy knowing that your code is
running efficiently. But remember that an unsigned int can (in some cases) be optimized better than an int for reasons previously pointed out.
If you don't need signed arithmatic, then don't include the sign bit.
x = x / 2; is the suitable code to use.. but an operation depend on your own program of how the output you wanted to produce.
Make your intentions clearer...for example, if you want to divide, use x / 2, and let the compiler optimize it to shift operator (or anything else).
Today's processors won't let these optimizations have any impact on the performance of your programs.
The answer to this will depend on the environment you're working under.
If you're working on an 8-bit microcontroller or anything without hardware support for multiplication, bit shifting is expected and commonplace, and while the compiler will almost certainly turn x /= 2 into x >>= 1, the presence of a division symbol will raise more eyebrows in that environment than using a shift to effect a division.
If you're working in a performance-critical environment or section of code, or your code could be compiled with compiler optimization off, x >>= 1 with a comment explaining its reasoning is probably best just for clarity of purpose.
If you're not under one of the above conditions, make your code more readable by simply using x /= 2. Better to save the next programmer who happens to look at your code the 10 second double-take on your shift operation than to needlessly prove you knew the shift was more efficient sans compiler optimization.
All these assume unsigned integers. The simple shift is probably not what you want for signed. Also, DanielH brings up a good point about using x *= 0.5 for certain languages like ActionScript.
mod 2, test for = 1. dunno the syntax in c. but this may be fastest.
generaly the right shift divides :
q = i >> n; is the same as: q = i / 2**n;
this is sometimes used to speed up programs at the cost of clarity. I don't think you should do it . The compiler is smart enough to perform the speedup automatically. This means that putting in a shift gains you nothing at the expense of clarity.
Take a look at this page from Practical C++ Programming.
Obviously, if you are writing your code for the next guy who reads it, go for the clarity of "x/2".
However, if speed is your goal, try it both ways and time the results. A few months ago I worked on a bitmap convolution routine which involved stepping through an array of integers and dividing each element by 2. I did all kinds of things to optimize it including the old trick of substituting "x>>1" for "x/2".
When I actually timed both ways I discovered to my surprise that x/2 was faster than x>>1
This was using Microsoft VS2008 C++ with the default optimizations turned on.
In terms of performance. CPU's shift operations are significantly faster than divide op-codes.
So dividing by two or multiplying by 2 etc all benefit from shift operations.
As to the look and feel. As engineers when did we become so attached to cosmetics that even beautiful ladies don't use! :)
X/Y is a correct one...and " >> " shifting operator..if we want two divide a integer we can use (/) dividend operator. shift operator is used to shift the bits..
x=x/2;
x/=2; we can use like this..
For an integer that is never expected to take -ve values, one could unsigned int or int.
From a compiler perspective or purely cpu cycle perspective is there any difference on x86_64 ?
It depends. It might go either way, depending on what you are doing with that int as well as on the properties of the underlying hardware.
An obvious example in unsigned ints favor would be the integer division operation. In C/C++ integer division is supposed to round towards zero, while machine integer division on x86 rounds towards negative infinity. Also, various "optimized" replacements for integer division (shifts, etc.) also generally round towards negative infinity. So, in order to satisfy standard requirements the compiler are forced to adjust the signed integer division results with additional machine instructions. In case of unsigned integer division this problem does not arise, which is why generally integer division works much faster for unsigned types than for signed types.
For example, consider this simple expression
rand() / 2
The code generated for this expression by MSVC complier will generally look as follows
call rand
cdq
sub eax,edx
sar eax,1
Note that instead of a single shift instruction (sar) we are seeing a whole bunch of instructions here, i.e our sar is preceded by two extra instructions (cdq and sub). These extra instructions are there just to "adjust" the division in order to force it to generate the "correct" (from C language point of view) result. Note, that the compiler does not know that your value will always be positive, so it has to generate these instructions always, unconditionally. They will never do anything useful, thus wasting the CPU cycles.
Not take a look at the code for
(unsigned) rand() / 2
It is just
call rand
shr eax,1
In this case a single shift did the trick, thus providing us with an astronomically faster code (for the division alone).
On the other hand, when you are mixing integer arithmetics and FPU floating-point arithmetics, signed integer types might work faster since the FPU instruction set contains immediate instruction for loading/storing signed integer values, but has no instructions for unsigned integer values.
To illustrate this one can use the following simple function
double zero() { return rand(); }
The generated code will generally be very simple
call rand
mov dword ptr [esp],eax
fild dword ptr [esp]
But if we change our function to
double zero() { return (unsigned) rand(); }
the generated code will change to
call rand
test eax,eax
mov dword ptr [esp],eax
fild dword ptr [esp]
jge zero+17h
fadd qword ptr [__real#41f0000000000000 (4020F8h)]
This code is noticeably larger because the FPU instruction set does not work with unsigned integer types, so the extra adjustments are necessary after loading an unsigned value (which is what that conditional fadd does).
There are other contexts and examples that can be used to demonstrate that it works either way. So, again, it all depends. But generally, all this will not matter in the big picture of your program's performance. I generally prefer to use unsigned types to represent unsigned quantities. In my code 99% of integer types are unsigned. But I do it for purely conceptual reasons, not for any performance gains.
Signed types are inherently more optimizable in most cases because the compiler can ignore the possibility of overflow and simplify/rearrange arithmetic in whatever ways it sees fit. On the other hand, unsigned types are inherently safer because the result is always well-defined (even if not to what you naively think it should be).
The one case where unsigned types are better optimizable is when you're writing division/remainder by a power of two. For unsigned types this translates directly to bitshift and bitwise and. For signed types, unless the compiler can establish that the value is known to be positive, it must generate extra code to compensate for the off-by-one issue with negative numbers (according to C, -3/2 is -1, whereas algebraically and by bitwise operations it's -2).
It will almost certainly make no difference, but occasionally the compiler can play games with the signedness of types in order to shave a couple of cycles, but to be honest it probably is a negligible change overall.
For example suppose you have an int x and want to write:
if(x >= 10 && x < 200) { /* ... */ }
You (or better yet, the compiler) can transform this a little to do one less comparison:
if((unsigned int)(x - 10) < 190) { /* ... */ }
This is making an assumption that int is represented in 2's compliment, so that if (x - 10) is less that 0 is becomes a huge value when viewed as an unsigned int. For example, on a typical x86 system, (unsigned int)-1 == 0xffffffff which is clearly bigger than the 190 being tested.
This is micro-optimization at best and best left up the compiler, instead you should write code that expresses what you mean and if it is too slow, profile and decide where it really is necessary to get clever.
I don't imagine it would make much difference in terms of CPU or the compiler. One possible case would be if it enabled the compiler to know that the number would never be negative and optimize away code.
However it IS useful to a human reading your code so they know the domain of the variable in question.
From the ALU's point of view adding (or whatever) signed or unsigned values doesn't make any difference, since they're both represented by a group of bit. 0100 + 1011 is always 1111, but you choose if that is 4 + (-5) = -1 or 4 + 11 = 15.
So I agree with #Mark, you should choose the best data-type to help others understand your code.