This question already has answers here:
Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
(6 answers)
Closed 7 years ago.
I just simply wanted to know, who is responsible to deal with mathematical overflow cases in a computer ?
For example, in the following C++ code:
short x = 32768;
std::cout << x;
Compiling and running this code on my machine gave me a result of -32767
A "short" variable's size is 2 bytes .. and we know 2 bytes can hold a maximum decimal value of 32767 (if signed) .. so when I assigned 32768 to x .. after exceeding its max value 32767 .. It started counting from -32767 all over again to 32767 and so on ..
What exactly happened so the value -32767 was given in this case ?
ie. what are the binary calculations done in the background the resulted in this value ?
So, who decided that this happens ? I mean who is responsible to decide that when a mathematical overflow happens in my program .. the value of the variable simply starts again from its min value, or an exception is thrown for example, or the program simply freezes .. etc ?
Is it the language standard, the compiler, my OS, my CPU, or who is it ?
And how does it deal with that overflow situation ? (Simple explanation or a link explaining it in details would be appreciated :) )
And btw, pls .. Also, who decides what a size of a 'short int' for example on my machine would be ? also is it a language standard, compiler, OS, CPU .. etc ?
Thanks in advance! :)
Edit:
Ok so I understood from here : Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
that It's the processor who defines what happens in an overflow situation (like for example in my machine it started from -32767 all over again), depending on "representations for signed values" of the processor, ie. is it sign magnitude, one's complement or two's complement ...
is that right ?
and in my case (When the result given was like starting from the min value -32767 again.. how do you suppose my CPU is representing the signed values, and how did the value -32767 for example come up (again, binary calculations that lead to this, pls :) ? )
It doesn't start at it's min value per se. It just truncates its value, so for a 4 bit number, you can count until 1111 (binary, = 15 decimal). If you increment by one, you get 10000, but there is no room for that, so the first digit is dropped and 0000 remains. If you would calculate 1111 + 10, you'd get 1.
You can add them up as you would on paper:
1111
0010
---- +
10001
But instead of adding up the entire number, the processor will just add up until it reaches (in this case) 4 bits. After that, there is no more room to add up any more, but if there is still 1 to 'carry', it sets the overflow register, so you can check whether the last addition it did overflowed.
Processors have basic instructions to add up numbers, and they have those for smaller and larger values. A 64 bit processor can add up 64 bit numbers (actually, usually they don't add up two numbers, but actually add a second number to the first number, modifying the first, but that's not really important for the story).
But apart from 64 bits, they often can also add up 32, 16 and 8 bit numbers. That's partly because it can be efficient to add up only 8 bits if you don't need more, but also sometimes to be backwards compatible with older programs for a previous version of a processor which could add up to 32 bits but not 64 bits.
Such a program uses an instruction to add up 32 bits numbers, and the same instruction must also exist on the 64 bit processor, with the same behavior if there is an overflow, otherwise the program wouldn't be able to run properly on the newer processor.
Apart from adding up using the core constructions of the processor, you could also add up in software. You could make an inc function that treats a big chunk of bits as a single value. To increment it, you can let the processor increment the first 64 bits. The result is stored in the first part of your chunk. If the overflow flag is set in the processor, you take the next 64 bits and increment those too. This way, you can extend the limitation of the processor to handle large numbers from software.
And same goes for the way an overflow is handled. The processor just sets the flag. Your application can decide whether to act on it or not. If you want to have a counter that just increments to 65535 and then wraps to 0, you (your program) don't need to do anything with the flag.
Related
I am using MinGW64 (with the -m64 flag) with Code::Blocks and am looking to know how to perform 64 bit calculations without having to cast a really big number to int64_t before multiplying it. For example, this does not result in overflow:
int64_t test = int64_t(2123123123) * 17; //Returns 36093093091
Without the cast, the calculation overflows like such:
int64_t test = 2123123123 * 17; //Returns 1733354723
A VirusTotal scan confirms that my executable is x64.
Additional Information: OS is Windows 7 x64.
The default int type is still 32 bit even in 64 bit compilations for compatibility resons.
The "shortest" version I guess would be to add the ll suffix to the number
int64_t test = 2123123123ll * 17;
Another way would be to store the numbers in their own variables of type int64_t (or long long) and multiply the varaibles. usually it's rare anyway in a program to have many "magic-numbers" hard-coded into the codebase.
Some background:
Once upon a time, most computers had 8-bit arithmetic logic units and a 16-bit address bus. We called them 8-bit computers.
One of the first things we learned was that no real-world arithmetic problem can be expressed in 8-bits. It's like trying to reason about space flight with the arithmetic abilities of a chimpanzee. So we learned to write multi-word add, multiply, subtract and divide sequences. Because in most real-world problems, the numerical domain of the problem was bigger than 255.
The we briefly had 16-bit computers (where the same problem applied, 65535 is just not enough to model things) and then quite quickly, 32-bit arithmetic logic built in to chips. Gradually, the address bus caught up (20 bits, 24 bits, 32 bits if designers were feeling extravagant).
Then an interesting thing happened. Most of us didn't need to write multi-word arithmetic sequences any more. It turns out that most(tm) real world integer problems could be expressed in 32 bits (up to 4 billion).
Then we started producing more data at a faster rate than ever before, and we perceived the need to address more memory. The 64-bit computer eventually became the norm.
But still, most real-world integer arithmetic problems could be expressed in 32 bits. 4 billion is a big (enough) number for most things.
So, presumably through statistical analysis, your compiler writers decided that on your platform, the most useful size for an int would be 32 bits. Any smaller would be inefficient for 32-bit arithmetic (which we have needed from day 1) and any larger would waste space/registers/memory/cpu cycles.
Expressing an integer literal in c++ (and c) yields an int - the natural arithmetic size for the environment. In the present day, that is almost always a 32-bit value.
The c++ specification says that multiplying two ints yields an int. If it didn't then multiplying two ints would need to yield a long. But then what would multiplying two longs yield? A long long? Ok, that's possible. Now what if we multiply those? A long long long long?
So that's that.
int64_t x = 1 * 2; will do the following:
take the integer (32 bits) of value 1.
take the integer (32 bits) of value 2.
multiply them together, storing the result in an integer. If the arithmetic overflows, so be it. That's your lookout.
cast the resulting integer (whatever that may now be) to int64 (probably on your system a long int.
So in a nutshell, no. There is no shortcut to spelling out the type of at least one of the operands in the code snippet in the question. You can, of course, specify a literal. But there is no guarantee that the a long long (LL literal suffix) on your system is the same as int64_t. If you want an int64_t, and you want the code to be portable, you must spell it out.
For what it's worth:
In a post-c++11 world all the worrying about extra keystrokes and non-DRYness can disappear:
definitely an int64:
auto test = int64_t(2123123123) * 17;
definitely a long long:
auto test = 2'123'123'123LL * 17;
definitely int64, definitely initialised with a (possibly narrowing, but that's ok) long long:
auto test = int64_t(36'093'093'091LL);
Since you're most likely in an LP64 environment, where int is only 32 bits, you have to be careful about literal constants in expressions. The easiest way to do this is to get into the habit of using the proper suffix on literal constants, so you would write the above as:
int64_t test = 2123123123LL * 17LL;
2123123123 is an int (usually 32 bits).
Add an L to make it a long: 2123123123L (usually 32 or 64 bits, even in 64-bit mode).
Add another L to make it a long long: 2123123123LL (64 bits or more starting with C++11).
Note that you only need to add the suffix to constants that exceed the size of an int. Integral conversion will take care of producing the right result*.
(2123123123LL * 17) // 17 is automatically converted to long long, the result is long long
* But beware: even if individual constants in an expression fit into an int, the whole operation can still overflow like in
(1024 * 1024 * 1024 * 10)
In that case you should make sure the arithmetic is performed at sufficient width (taking operator precedence into account):
(1024LL * 1024 * 1024 * 10)
- will perform all 3 operations in 64 bits, with a 64-bit result.
Edit: Literal constants (A.K.A. magic numbers) are frowned upon, so the best way to do it would be to use symbolic constants (const int64_t value = 5). See What is a magic number, and why is it bad? for more info. It's best that you don't read the rest of this answer, unless you really want to use magic numbers for some strange reason.
Also, you can use intptr_t and uintprt_t from #include <cstdint> to let the compiler choose whether to use int or __int64.
For those who stumble upon this question, `LL` at the end of a number can do the trick, but it isn't recommended, as Richard Hodges told me that `long long` may not be always 64 bit, and can increase in size in the future, although it's not likely. See Richard Hodge's answer and the comments on it for more information.
The reliable way would be to put `using QW = int_64t;` at the top and use `QW(5)` instead of `5LL`.
Personally I think there should be an option to define all literals 64 bit without having to add any suffixes or functions to them, and use `int32_t(5)` when necessary, because some programs are unaffected by this change. Example: only use numbers for normal calculations instead of relying on integer overflow to do it's work. The problem is going from 64 bit to 32 bit, rather than going from 32 to 64, as the first 4 bytes are cut off.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have access to a program which I'm running which SHOULD be guessing a very low number for certain things and outputting the number (probably 0 or 1). However, 0.2% of the time when it should be outputting 0 it outputs a number from 4,294,967,286 - 4,294,967,295. (Note: the latter is the max number an unsigned integer can be).
What I GUESS is happening is the function is guessing the number of the data to be less than 0 aka -1 to -9 and when it assigns that number to an unsigned int it's wrapping the number around to be the max or close to the max number.
I therefore assumed the program is written in C (I do not have access to the source code) and then tested in Visual Studio .NET 2012 C what would happen if I assign a variety of negative numbers to an unsigned integer. Unfortunately, nothing seemed to happen - it would still output the number to the console as a negative integer. I'm wondering if this is to do with MSVS 2012 trying to be smart or perhaps some other reason.
Anyway, am I correct in assuming that this is in fact what is happening and the reason why the programs outputs the max number of an unisnged int? Or are there any other valid reasons as to why this is happening?
Edit: All I want to know is if it's valid to assume that attempting to assign a negative number to an unsigned integer can result in setting the integer to the max number aka 4,294,967,295. If this is IMPOSSIBLE then okay, I'm not looking at SPECIFICS on exactly why this is happening with the program as I do not have access to the code. All I want to know is if it's possible and therefore a possible explanation as to why I am getting these results.
In C and C++ assigning -1 to an unsigned number will give you the maximum unsigned value.
This is guaranteed by the standard and all compilers I know (even VC) implement this part correctly. Probably your C example has some other problem for not showing this result (cannot say without seeing the code).
You can think of negative numbers to have its first bit counting negative.
A 4 bit integer would be
Binary HEX INT4 UINT4
(In Memory) (As decimal) (As decimal)
0000 0x0 0 0 (UINT4_MIN)
0001 0x1 1 1
0010 0x2 2 2
0100 0x4 4 4
0111 0x7 7 (INT4_MAX) 7
1000 0x8 -8 (INT4_MIN) 8
1111 0xF -1 15 (UINT4_MAX)
It may be that the header of a library lies to you and the value is negative.
If the library has no other means of telling you about errors this may be a deliberate error value. I have seen "nonsensical" values used in that manner before.
The error could be calculated as (UINT4_MAX - error) or always UINT4_MAX if an error occurs.
Really, without any source code this is a guessing game.
EDIT:
I expanded the illustrating table a bit.
If you want to log a number like that you may want to log it in hexadecimal form. The Hex view allows you to peek into memory a bit quicker if you are used to it.
First of all: I really tried to find a matching answer for this, but I just wasn't successful.
I am currently working on a little 8086 emulator. What I haven't still figured out is how the Overflow and Auxilliary flags are calculated best for addition and subtraction.
As far as I know the Auxilliary Flag complies with the Overflow flag but only uses 4 bits while the Overflow Flag uses the whole size. So if I am adding two signed 1-byte integers the OF would check for 1-byte signed overflow while the Auxilliary Flag would only look at the lower 4 bytes of the two integers.
Are there any generic algorithms or "magic bitwise operations" for calculating the signed overflow for 4,8 and 16 bit addition and subtraction? (I don't mind what language there are written in)
Remark: I need to store the values in unsigned variables internally, so I do only have the possibility to work with unsigned values or bitwise calculations.
Might one solution that works for addition and subtraction be to check whether the "Sign Flag" (or bit 4 for the Auxilliary flag) has changed after the calculation is done?
Thanks in advance!
Overflow Flag indicates whether the result is too large/too small to fit in the destination operand, regardless of its size.
Auxilliary Flag indicates whether the result is too large/too small to fit in four bits.
Edit: How to determine AF: Explain how the AF flag works in an x86 instructions? .
I am just wondering whether differences will stay correct through an overflow. As example, I am trying to use a windows high resolution timer with QueryPerformanceFrequency(&local).
The starting value of that counter is undefined. However, the interesting bit is only the difference from the starting point. So at the beginning you record the value, and then always look at the diff. Now if I can guarantee that the difference won't be larger than a LARGE_INTEGER, is this sufficient?
Say e.g. one has 4 bits. That allows for 1...15. If the counter now starts at 14, and stops at 2, and I do 2 - 14, I should be getting 4, shouldn't I? So I needn't worry about an overflow as long the difference is smaller?
Thanks
Since you are using a Windows-specific structure, your problem is easier since it only needs to run on machines that support Windows. Windows requires twos-complement arithmetic, and twos-complement arithmetic behaves on overflow in the manner you expect (overflows are computed mod 2^n).
I'm not going to answer the general question but rather the specific one: do you need to worry about overflows from QueryPerformanceCounter?
If you have a performance counter that is incrementing at 4 GHz, it will take 73 years for a 63-bit signed integer to wrap around to a negative number. No need to worry about overflow.
On my computer at least, the definition of LARGE_INTEGER is:
typedef union _LARGE_INTEGER {
struct {
DWORD LowPart;
LONG HighPart;
};
LONGLONG QuadPart;
} LARGE_INTEGER;
The tricky part is all of those are signed. So if you have four bits, the range is (-8,7). Then if you start at 6, and stop at 0, you get a difference of -6.
However, if you cast the LONGLONG to a unsigned long long (either before or after the subtraction, either is fine), then you should get the correct answer. Converting -6 to unsigned long long results in 10, the correct difference.
Using 2's complement (the way integers are represented in computers), you can add or subtract multiple numbers -- and the result will be correct as long as it fits in the number of bits allocated. The temporary results need not fit in the allocated number of bits.
So yes, if you use an integer of N bits, you'll get the correct result as long as the difference is less than 2^N.
I'm trying to write a function in assembly (but lets assume language agnostic for the question).
How can I use bitwise operators to set all bits of a passed in number to 1?
I know that I can use the bitwise "or" with a mask with the bits I wish to set, but I don't know how to construct a mask based off some a binary number of N size.
~(x & 0)
x & 0 will always result in 0, and ~ will flip all the bits to 1s.
Set it to 0, then flip all the bits to 1 with a bitwise-NOT.
You're going to find that in assembly language you have to know the size of a "passed in number". And in assembly language it really matters which machine the assembly language is for.
Given that information, you might be asking either
How do I set an integer register to all 1 bits?
or
How do I fill a region in memory with all 1 bits?
To fill a register with all 1 bits, on most machines the efficient way takes two instructions:
Clear the register, using either a special-purpose clear instruction, or load immediate 0, or xor the register with itself.
Take the bitwise complement of the register.
Filling memory with 1 bits then requires 1 or more store instructions...
You'll find a lot more bit-twiddling tips and tricks in Hank Warren's wonderful book Hacker's Delight.
Set it to -1. This is usually represented by all bits being 1.
Set x to 1
While x < number
x = x * 2
Answer = number or x - 1.
The code assumes your input is called "number". It should work fine for positive values. Note for negative values which are twos complement the operation attempt makes no sense as the high bit will always be one.
Use T(~T(0)).
Where T is the typename (if we are talking about C++.)
This prevents the unwanted promotion to int if the type is smaller than int.