int main(){
unsigned int num1 = 0x65764321;
unsigned int num2 = 0x23657432;
unsigned int sum = num1 + num2;
cout << hex << sum;
return 0;
}
If i have two unsigned integers say num1 and num2. And then I tell the computer to unsigned
int sum = num1 + num2;
What method does the computer use to add them, would it be two's complement. Would the sum variable be printed in two's complement.
2's complement addition is identical to unsigned addition as far the actual bits are concerned. In the actual hardware, the design will be something complicated like a https://en.wikipedia.org/wiki/Carry-lookahead_adder, so it can be low latency (not having to wait for the carry to ripple across 32 or 64 bits, because that's too many gate-delays for add to be single-cycle latency.)
One's complement and sign/magnitude are the other signed-integer representations that C++ allows implementations to use, and their wrap-around behaviour is different from unsigned.
For example, one's complement addition has to wrap the carry-out back into the low bit. See this article about optimizing TCP checksum calculation for how you implement one's complement addition on hardware that only provide 2's complement / unsigned addition. (Specifically x86).
C++ leaves signed overflow as undefined behaviour, but real one's complement and sign/magnitude hardware does have specific documented behaviour. reinterpret_casting an unsigned bit pattern to a signed integer gives a result that depends on what kind of hardware you're running on. (All modern hardware is 2's complement, though.)
Since the bitwise operation is the same for unsigned or 2's complement, it's all about how you interpret the results. On CPU architectures like x86 that set flags based on the results of an instruction, the overflow flag is only relevant for the signed interpretation, and the carry flag is only relevant for the unsigned interpretation. The hardware produces both from a single instruction, instead of having separate signed/unsigned add instructions that do the same thing.
See http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt for a great write-up about unsigned carry vs. signed overflow, and x86 flags.
On other architectures, like MIPS, there is no FLAGS register. You have to use a compare or test instruction to figure out what happened (carry or zero or whatever). The add instruction doesn't set flags. See this MIPS Q&A about add-with-carry for a 64-bit add on 32-bit MIPS.
But for detecting signed overflow, add raises an exception on overflow (where x86 would set OF), so you use addu for signed or unsigned addition if you want it to not fault on signed overflow.
now the overflow flag here is 1(its an example given by our instructor) meaning there is overflow but there is no carry, so how can there be overflow here
You have a C++ program, not an x86 assembly language program! C++ doesn't have a carry or overflow flag.
If you compiled this program for x86 with a non-optimizing compiler, and it used the ADD instruction with your two inputs, you would get OF=1 and CF=0 from that ADD instruction.
But the compiler might use lea edi, [rax+rdx] to do the sum without overwriting either input, and LEA doesn't set flags.
Or if the compiler did the addition at compile time, your code would compile the same as source like this:
cout << hex << 0x88dbb753U;
and no addition of your numbers would take place at run-time. (There will of course be lots of addition in the iostream library functions, and maybe even an add instruction in main() as part of making a stack frame, if your compiler chooses to emit code that sets up a stack frame.)
i have two unsigned integers
What method does the computer use to add them
Whatever method is available on the target CPU architecture. Most have an instruction named ADD.
Would the sum variable be printed in two's complement.
Two's complement is a way to represent an integer type in binary. It is not a way to print numbers.
Related
I want to add 2 unsigned vectors using AVX2
__m256i i1 = _mm256_loadu_si256((__m256i *) si1);
__m256i i2 = _mm256_loadu_si256((__m256i *) si2);
__m256i result = _mm256_adds_epu16(i2, i1);
however I need to have overflow instead of saturation that _mm256_adds_epu16 does to be identical with the non-vectorized code, is there any solution for that?
Use normal binary wrapping _mm256_add_epi16 instead of saturating adds.
Two's complement and unsigned addition/subtraction are the same binary operation, that's one of the reasons modern computers use two's complement. As the asm manual entry for vpaddw mentions, the instructions can be used on signed or unsigned integers. (The intrinsics guide entry doesn't mention signedness at all, so is less helpful at clearing up this confusion.)
Compares like _mm_cmpgt_epi32 are sensitive to signedness, but math operations (and cmpeq) aren't.
The intrinsics names Intel chose might look like they're for signed integers specifically, but they always use epi or si for things that work equally on signed and unsigned elements. But no, epu implies a specifically unsigned thing, while epi can be specifically signed operations or can be things that work equally on signed or unsigned. Or things where signedness is irrelevant.
For example, _mm_and_si128 is pure bitwise. _mm_srli_epi32 is a logical right shift, shifting in zeros, like an unsigned C shift. Not copies of the sign bit, that's _mm_srai_epi32 (shift right arithmetic by immediate). Shuffles like _mm_shuffle_epi32 just move data around in chunks.
Non-widening multiplication like _mm_mullo_epi16 and _mm_mullo_epi32 are also the same for signed or unsigned. Only the high-half _mm_mulhi_epu16 or widening multiplies _mm_mul_epu32 have unsigned forms as counterparts to their specifically signed epi16/32 forms.
That's also why 386 only added a scalar integer imul ecx, esi form, not also a mul ecx, esi, because only the FLAGS setting would differ, not the integer result. And SIMD operations don't even have FLAGS outputs.
The intrinsics guide unhelpfully describes _mm_mullo_epi16 as sign-extending and producing a 32-bit product, then truncating to the low 32-bit. The asm manual for pmullw also describes it as signed that way, it seems talking about it as the companion to signed pmulhw. (And has some bugs, like describing the AVX1 VPMULLW xmm1, xmm2, xmm3/m128 form as multiplying 32-bit dword elements, probably a copy/paste error from pmulld)
And sometimes Intel's naming scheme is limited, like _mm_maddubs_epi16 is a u8 x i8 => 16-bit widening multiply, adding pairs horizontally (with signed saturation). I usually have to look up the intrinsic for pmaddubsw to remind myself that they named it after the output element width, not the inputs. The inputs have different signedness so if they have to pick one, side, I guess it makes sense to name it for the output, with the signed saturation that can happen with some inputs, like for pmaddwd.
When I use the >> bitwise operator on 1000 in c++ it gives this result: 1100. I want the result to be 0100. When the 1 is in any other position this is exactly what happens, but with a leading 1 it goes wrong. Why is that and how can it be avoided?
The behavior you describe is coherent with what happens on some platforms when right-shifting a signed integer with the high bit set (so, negative values).
In this case, on many platforms compilers will emit code to perform an arithmetic shift, which propagates the sign bit; this, on platforms with 2's complement representation for negative integers (= virtually every current platform) has the effect of giving the "x >> i = floor(x/2i)" behavior even on negative values. Notice that this is not contractual - as far as the C++ standard is concerned, shifting negative integers in implementation-defined behavior, so any compiler is free to implement different semantics for it1.
To come to your question, to obtain the "regular" shift behavior (generally called "logical shift") you have to make sure to work on unsigned integers. This can be obtained either making sure that the variable you are shifting is of unsigned type (e.g. unsigned int) or, if it's a literal, by putting an U suffix to it (e.g. 1 is an int, 1U is an unsigned int).
If the data you have is of a signed type (e.g. int) you may cast it to the corresponding unsigned type before shifting without risks (conversion from a signed int to an unsigned one is well-defined by the standard, and doesn't change the bit values on 2's complement machines).
Historically, this comes from the fact that C strove to support even machines that didn't have "cheap" arithmetic shift functionality at hardware level and/or didn't use 2's complement representation.
As mentioned by others, when right shifting on a signed int, it is implementation defined whether you will get 1s or 0s. In your case, because the left most bit in 1000 is a 1, the "replacement bits" are also 1. Assuming you must work with signed ints, in order to get rid of it, you can apply a bitmask.
First of all: I really tried to find a matching answer for this, but I just wasn't successful.
I am currently working on a little 8086 emulator. What I haven't still figured out is how the Overflow and Auxilliary flags are calculated best for addition and subtraction.
As far as I know the Auxilliary Flag complies with the Overflow flag but only uses 4 bits while the Overflow Flag uses the whole size. So if I am adding two signed 1-byte integers the OF would check for 1-byte signed overflow while the Auxilliary Flag would only look at the lower 4 bytes of the two integers.
Are there any generic algorithms or "magic bitwise operations" for calculating the signed overflow for 4,8 and 16 bit addition and subtraction? (I don't mind what language there are written in)
Remark: I need to store the values in unsigned variables internally, so I do only have the possibility to work with unsigned values or bitwise calculations.
Might one solution that works for addition and subtraction be to check whether the "Sign Flag" (or bit 4 for the Auxilliary flag) has changed after the calculation is done?
Thanks in advance!
Overflow Flag indicates whether the result is too large/too small to fit in the destination operand, regardless of its size.
Auxilliary Flag indicates whether the result is too large/too small to fit in four bits.
Edit: How to determine AF: Explain how the AF flag works in an x86 instructions? .
For an integer that is never expected to take -ve values, one could unsigned int or int.
From a compiler perspective or purely cpu cycle perspective is there any difference on x86_64 ?
It depends. It might go either way, depending on what you are doing with that int as well as on the properties of the underlying hardware.
An obvious example in unsigned ints favor would be the integer division operation. In C/C++ integer division is supposed to round towards zero, while machine integer division on x86 rounds towards negative infinity. Also, various "optimized" replacements for integer division (shifts, etc.) also generally round towards negative infinity. So, in order to satisfy standard requirements the compiler are forced to adjust the signed integer division results with additional machine instructions. In case of unsigned integer division this problem does not arise, which is why generally integer division works much faster for unsigned types than for signed types.
For example, consider this simple expression
rand() / 2
The code generated for this expression by MSVC complier will generally look as follows
call rand
cdq
sub eax,edx
sar eax,1
Note that instead of a single shift instruction (sar) we are seeing a whole bunch of instructions here, i.e our sar is preceded by two extra instructions (cdq and sub). These extra instructions are there just to "adjust" the division in order to force it to generate the "correct" (from C language point of view) result. Note, that the compiler does not know that your value will always be positive, so it has to generate these instructions always, unconditionally. They will never do anything useful, thus wasting the CPU cycles.
Not take a look at the code for
(unsigned) rand() / 2
It is just
call rand
shr eax,1
In this case a single shift did the trick, thus providing us with an astronomically faster code (for the division alone).
On the other hand, when you are mixing integer arithmetics and FPU floating-point arithmetics, signed integer types might work faster since the FPU instruction set contains immediate instruction for loading/storing signed integer values, but has no instructions for unsigned integer values.
To illustrate this one can use the following simple function
double zero() { return rand(); }
The generated code will generally be very simple
call rand
mov dword ptr [esp],eax
fild dword ptr [esp]
But if we change our function to
double zero() { return (unsigned) rand(); }
the generated code will change to
call rand
test eax,eax
mov dword ptr [esp],eax
fild dword ptr [esp]
jge zero+17h
fadd qword ptr [__real#41f0000000000000 (4020F8h)]
This code is noticeably larger because the FPU instruction set does not work with unsigned integer types, so the extra adjustments are necessary after loading an unsigned value (which is what that conditional fadd does).
There are other contexts and examples that can be used to demonstrate that it works either way. So, again, it all depends. But generally, all this will not matter in the big picture of your program's performance. I generally prefer to use unsigned types to represent unsigned quantities. In my code 99% of integer types are unsigned. But I do it for purely conceptual reasons, not for any performance gains.
Signed types are inherently more optimizable in most cases because the compiler can ignore the possibility of overflow and simplify/rearrange arithmetic in whatever ways it sees fit. On the other hand, unsigned types are inherently safer because the result is always well-defined (even if not to what you naively think it should be).
The one case where unsigned types are better optimizable is when you're writing division/remainder by a power of two. For unsigned types this translates directly to bitshift and bitwise and. For signed types, unless the compiler can establish that the value is known to be positive, it must generate extra code to compensate for the off-by-one issue with negative numbers (according to C, -3/2 is -1, whereas algebraically and by bitwise operations it's -2).
It will almost certainly make no difference, but occasionally the compiler can play games with the signedness of types in order to shave a couple of cycles, but to be honest it probably is a negligible change overall.
For example suppose you have an int x and want to write:
if(x >= 10 && x < 200) { /* ... */ }
You (or better yet, the compiler) can transform this a little to do one less comparison:
if((unsigned int)(x - 10) < 190) { /* ... */ }
This is making an assumption that int is represented in 2's compliment, so that if (x - 10) is less that 0 is becomes a huge value when viewed as an unsigned int. For example, on a typical x86 system, (unsigned int)-1 == 0xffffffff which is clearly bigger than the 190 being tested.
This is micro-optimization at best and best left up the compiler, instead you should write code that expresses what you mean and if it is too slow, profile and decide where it really is necessary to get clever.
I don't imagine it would make much difference in terms of CPU or the compiler. One possible case would be if it enabled the compiler to know that the number would never be negative and optimize away code.
However it IS useful to a human reading your code so they know the domain of the variable in question.
From the ALU's point of view adding (or whatever) signed or unsigned values doesn't make any difference, since they're both represented by a group of bit. 0100 + 1011 is always 1111, but you choose if that is 4 + (-5) = -1 or 4 + 11 = 15.
So I agree with #Mark, you should choose the best data-type to help others understand your code.
I have a few questions about divide overflow errors on x86 or x86_64 architecture. Lately I've been reading about integer overflows. Usually, when an arithmetic operation results in an integer overflow, the carry bit or overflow bit in the FLAGS register is set. But apparently, according to this article, overflows resulting from division operations don't set the overflow bit, but rather trigger a hardware exception, similar to when you divide by zero.
Now, integer overflows resulting from division are a lot more rare than say, multiplication. There's only a few ways to even trigger a division overflow. One way would be to do something like:
int16_t a = -32768;
int16_t b = -1;
int16_t c = a / b;
In this case, due to the two's complement representation of signed integers, you can't represent positive 32768 in a signed 16-bit integer, so the division operation overflows, resulting in the erroneous value of -32768.
A few questions:
1) Contrary to what this article says, the above did NOT cause a hardware exception. I'm using an x86_64 machine running Linux, and when I divide by zero the program terminates with a Floating point exception. But when I cause a division overflow, the program continues as usual, silently ignoring the erroneous quotient. So why doesn't this cause a hardware exception?
2) Why are division errors treated so severely by the hardware, as opposed to other arithmetic overflows? Why should a multiplication overflow (which is much more likely to accidentally occur) be silently ignored by the hardware, but a division overflow is supposed to trigger a fatal interrupt?
=========== EDIT ==============
Okay, thanks everyone for the responses. I've gotten responses saying basically that the above 16-bit integer division shouldn't cause a hardware fault because the quotient is still less than the register size. I don't understand this. In this case, the register storing the quotient is 16-bit - which is too small to store signed positive 32768. So why isn't a hardware exception raised?
Okay, let's do this directly in GCC inline assembly and see what happens:
int16_t a = -32768;
int16_t b = -1;
__asm__
(
"xorw %%dx, %%dx;" // Clear the DX register (upper-bits of dividend)
"movw %1, %%ax;" // Load lower bits of dividend into AX
"movw %2, %%bx;" // Load the divisor into BX
"idivw %%bx;" // Divide a / b (quotient is stored in AX)
"movw %%ax, %0;" // Copy the quotient into 'b'
: "=rm"(b) // Output list
:"ir"(a), "rm"(b) // Input list
:"%ax", "%dx", "%bx" // Clobbered registers
);
printf("%d\n", b);
This simply outputs an erroneous value: -32768. Still no hardware exception, even though the register storing the quotient (AX) is too small to fit the quotient. So I don't understand why no hardware fault is raised here.
In C language arithmetic operations are never performed within the types smaller than int. Any time you attempt arithmetic on smaller operands, they are first subjected to integral promotions which convert them to int. If on your platform int is, say, 32-bit wide, then there's no way to force a C program to perform 16-bit division. The compiler will generate 32-bit division instead. This is probably why your C experiment does not produce the expected overflow on division. If your platform does indeed have 32-bit int, then your best bet would be to try the same thing with 32-bit operands (i.e. divide INT_MIN by -1). I'm pretty sure that way you'll be able to eventually reproduce the overflow exception even in C code.
In your assembly code you are using 16-bit division, since you specified BX as the operand for idiv. 16-bit division on x86 divides the 32-bit dividend stored in DX:AX pair by the idiv operand. This is what you are doing in your code. The DX:AX pair is interpreted as one composite 32-bit register, meaning that the sign bit in this pair is now actually the highest-order bit of DX. The highest-order bit of AX is not a sign bit anymore.
And what you did you do with DX? You simply cleared it. You set it to 0. But with DX set to 0, your dividend is interpreted as positive! From the machine point of view, such a DX:AX pair actually represents a positive value +32768. I.e. in your assembly-language experiment you are dividing +32768 by -1. And the result is -32768, as it should be. Nothing unusual here.
If you want to represent -32768 in the DX:AX pair, you have to sign-extend it, i.e. you have to fill DX with all-one bit pattern, instead of zeros. Instead of doing xor DX, DX you should have initialized AX with your -32768 and then done cwd. That would have sign-extended AX into DX.
For example, in my experiment (not GCC) this code
__asm {
mov AX, -32768
cwd
mov BX, -1
idiv BX
}
causes the expected exception, because it does indeed attempt to divide -32768 by -1.
When you get an integer overflow with integer 2's complement add/subtract/multiply you still have a valid result - it's just missing some high order bits. This behaviour is often useful, so it would not be appropriate to generate an exception for this.
With integer division however the result of a divide by zero is useless (since, unlike floating point, 2's complement integers have no INF representation).
Contrary to what this article says, the above did NOT cause a hardware exception
The article did not say that. Is says
... they generate a division error if the source operand (divisor) is zero or if the quotient is too large for the designated register
Register size is definitely greater than 16 bits (32 || 64)
From the relevant section on integer overflow:
Unlike the add, mul, and imul
instructions, the Intel division
instructions div and idiv do not set
the overflow flag; they generate a
division error if the source operand
(divisor) is zero or if the quotient
is too large for the designated
register.
The size of a register is on a modern platform either 32 or 64 bits; 32768 will fit into one of those registers. However, the following code will very likely throw an integer overflow execption (it does on my core Duo laptop on VC8):
int x= INT_MIN;
int y= -1;
int z= x/y;
The reason your example did not generate a hardware exception is due to C's integer promotion rules. Operands smaller than int get automatically promoted to ints before the operation is performed.
As to why different kinds of overflows are handled differently, consider that at the x86 machine level, there's really no such thing a multiplication overflow. When you multiply AX by some other register, the result goes in the DX:AX pair, so there is always room for the result, and thus no occasion to signal an overflow exception. However, in C and other languages, the product of two ints is supposed to fit in an int, so there is such a thing as overflow at the C level. The x86 does sometimes set OF (overflow flag) on MULs, but it just means that the high part of the result is non-zero.
On an implementation with 32-bit int, your example does not result in a divide overflow. It results in a perfectly representable int, 32768, which then gets converted to int16_t in an implementation-defined manner when you make the assignment. This is due to the default promotions specified by the C language, and as a result, an implementation which raised an exception here would be non-conformant.
If you want to try to cause an exception (which still may or may not actually happen, it's up to the implementation), try:
int a = INT_MIN, b = -1, c = a/b;
You might have to do some tricks to prevent the compiler from optimizing it out at compile-time.
I would conjecture that on some old computers, attempting to divide by zero would cause some severe problems (e.g. put the hardware into an endless cycle of trying to subtract enough so the remainder would be less than the dividend until an operator came along to fix things), and this started a tradition of divide overflows being regarded as more severe faults than integer overflows.
From a programming standpoint, there's no reason that an unexpected divide overflow should be any more or less serious than an unexpected integer overflow (signed or unsigned). Given the cost of division, the marginal cost of checking an overflow flag afterward would be pretty slight. Tradition is the only reason I can see for having a hardware trap.