Overflow When Calculating Average? - c++

Given 2 integer numbers we can calculate their average like this:
return (a+b)/2;
which isn't safe since (a+b) can cause overflow (Side Note: can someone tell me the correct term for this case maybe memory overflow?)
So we write:
return a+(b-a)/2;
can the same trick be implemented over n numbers and how?

Note that there are several different averages. I assume that you're asking about the arithmetic mean.
overflow (Side Note: can someone tell me the correct term for this case maybe memory overflow?)
The correct term is arithmetic overflow, or just overflow. Not memory overflow.
a+(b-a)/2;
b-a can also overflow. This isn't quite as easy to solve as it may seem.
Standard library has a function template to do this correctly without overflow: std::midpoint.
I checked an implementation of std::midpoint, and they do what you suggested for integers, except the operands are first converted to the corresponding unsigned type. Then the result is converted back. A mathematician may explain how that works, but I guess that it has something to do with the magic of modular arithmetic.
For floats, they do a / 2 + b / 2 (if the inputs are normal).
can the same trick be implemented over n numbers and how?
Simplest solution that works with all inputs without overflow and without imprecision is probably to use arbitrary precision arithmetic.

One way of getting average number for multiple numbers is to find the Cumulative Moving Average, or CMA:
Your code a + (b - a) / 2 can also be derived from this equation for n + 1 == 2.
Translating above equation to code, you would get something similar to:
std::vector<int> vec{10, 5, 8, 3, 2, 8}; // average is 6
double average = 0.0;
for(auto n = 0; n < vec.size(); ++n)
{
average += (vec[n] - average) / (n + 1);
}
std::cout << average; // prints 6
Alternatively, you can also use the std::accumulate:
std::cout << std::accumulate(vec.begin(), vec.end(), 0.0,
[n = 0](auto cma, auto i) mutable {
return cma + (i - cma) / ++n;
});
Do note any time you are using floating division can result into imprecise result, especially when you attempt to do that for numerous times. For more regarding impreciseness, you can look at: Is floating point math broken?

Related

Loss of precision with pow function when surpassing 10^10 limit?

Doing one of my first homeworks of uni, and have ran into this problem:
Task: Find a sum of all n elements where n is the count of numerals in a number (n=1, means 1, 2, 3... 8, 9 for example, answer is 45)
Problem: The code I wrote has gotten all the test answers correctly up to 10 to the power of 9, but when it reaches 10 to the power of 10 territory, then the answers start being wrong, it's really close to what I should be getting, but not quite there (For example, my output = 49499999995499995136, expected result = 49499999995500000000)
Would really appreciate some help/insights, am guessing it's something to do with the variable types, but not quite sure of a possible solution..
#include <iostream>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
int n;
double ats = 0, maxi, mini;
cin >> n;
maxi = pow(10, n) - 1;
mini = pow(10, n-1) - 1;
ats = (maxi * (maxi + 1)) / 2 - (mini * (mini + 1)) / 2;
cout << setprecision(0) << fixed << ats;
}
The main reason of problems is pow() function. It works with double, not int. Loss of accuracy is price for representing huge numbers.
There are 3 way's to solve problem:
For small n you can make your own long long int pow(int x, int pow) function. But there is problem, that we can overflow even long long int
Use long arithmetic functions, as #rustyx sayed. You can write your own with vector, or find and include library.
There is Math solution specific for topic's task. It solves the big numbers problem.
You can write your formula like
((10^n) - 1) * (10^n) - (10^m - 1) * (10^m)) / 2 , (here m = n-1)
Then multiply numbers in numerator. Regroup them. Extract common multiples 10^(n-1). And then you can see, that answer have a structure:
X9...9Y0...0 for big enought n, where letter X and Y are constants.
So, you can just print the answer "string" without calculating.
I think you're stretching floating points beyond their precision. Let me explain:
The C pow() function takes doubles as arguments. You're passing ints, the compiler is adding the code to convert them to doubles before they reach pow(). (And anyway you're storing it as a double when you get the return value since you declared it that way).
Floating points are called that way precisely because the point "floats". Inside a double there's a sign bit, a few bits for the mantissa and a few bits for the exponent. In binary, elevating to a power of two is equivalent to moving the fractional point to the right (or to the left if you're elevating to a negative number). So basically the exponent is saying where the fractional point is, in binary. The great advantage of using this kind of in-memory representation for doubles is that you get a lot of precision for numbers close to 0, and gradually lose precision as numbers become bigger.
That last thing is exactly what's happening to you. Your number is too large to be stored exactly. So it's being rounded to the closest sum of powers of two (powers of two are the numbers that have all zeroes to the right in binary).
Quick experiment: press F12 in your browser, open the javascript console and type 49499999995499995136. In my case, in chrome, I reproduce the same problem.
If you really really really want precision with such big numbers then you can try some of these libraries, but that's too advanced for a student program, you don't need it. Just add an if block and print an error message if the number that the user typed is too big (professors love that, which is actually quite correct).

nan output due to maclaurin series expansion of sine, console crashes

Here is my code:
#include <iostream>
#include <cmath>
using namespace std;
int factorial(int);
int main()
{
for(int k = 0; k < 100000; k++)
{
static double sum = 0.0;
double term;
term = (double)pow(-1.0, k) * (double)pow(4.0, 2*k+1) / factorial(2*k+1);
sum = sum + term;
cout << sum << '\n';
}
}
int factorial(int n)
{
if(n == 0)
{
return 1;
}
return n*factorial(n-1);
}
I'm just trying to calculate the value of sine(4) using the maclaurin expansion form of sine. For each console output, the value reads 'nan'. The console gives an error and shuts down after like 10 second. I don't get any errors in the IDE.
There're multiple problems with your approach.
Your factorial function can't return an int. The return value will be way too big, very quickly.
Using pow(-1, value) to get a alternating positive/negative one is very inefficient and will yield incorrect value pretty quick. You should pick 1.0 or -1.0 depending on k's parity.
When you sum a long series of terms, you want to sum the terms with the least magnitude first. Otherwise, you lose precision due to existing bit limiting the range you can reach. In your case, the power of four is dominated by the factorial, so you sum the highest magnitude values first. You'd probably get better precision starting by the other end.
Algorithmically, if you're going to raise 4 to the 2k+1 power and then divide by (2k+1)!, you should keep both the list of factors (4, 4, 4, 4...) and (2,3,4,5,6,7,8,9,....) and simplify both sides. There's plenty of fours to remove on the numerators and denominators at the same time.
Even with those four, I'm not sure you can get anywhere close to the 100000 target you set, without specialized code.
As already stated by others, the intermediate results you will get for large k are magnitudes too large to fit into a double. From a certain k on pow as well as factorial will return infinity. This is simply what happens for very large doubles. And as you then divide one infinity by another you get NaN.
One common trick to deal with too large numbers is using logarithms for intermediate results and only in the end apply the exponential function once.
Some mathematical knowledge of logarithms is required here. To understand what I am doing here you need to know exp(log(x)) == x, log(a^b) == b*log(a), and log(a/b) == log(a) - log(b).
In your case you can rewrite
pow(4, 2*k+1)
to
exp((2*k+1)*log(4))
Then there is still the factorial. The lgamma function can help with factorial(n) == gamma(n+1) and log(factorial(n)) == lgamma(n+1). In short, lgamma gives you the log of a factorial without huge intermediate results.
So summing up, replace
pow(4, 2*k+1) / factorial(2*k+1)
With
exp((2*k+1)*log(4) - lgamma(2*k+2))
This should help you with your NaNs. Also, this should increase performance as lgamma operates in O(1) whereas your factorial is in O(k).
Note, however, that I have still very little confidence that your result will be numerically accurate.
A double has still limited precision of roughly 16 decimal digits. Your 100000 iterations are very likely worthless, probably even harmfull.

distribute two integers according to ratio

Say, I have some integer n and would like to subdivide it into two other integers according to some ratio. I have some approach where I ask myself whether it does work or not.
For example: 20 with ratio 70% should be subdivided into 14,6.
The obvious solution would be:
int n = 20;
double ratio = .7;
int n1 = static_cast<int>(n * ratio);
int n2 = static_cast<int>(n * (1 - ratio));
Since the cast always floors, however, I usually underrate my result. If I use std::round, there are still cases that are not working. For example, if the first decimal place is a 5, then both numbers will be rounded up.
Some colleagues suggested: Ceil the first number and floor the second one. In most of my tests, this works, however:
1) Does it really always work, also taking into accounting possible rounding errors that naturally occur in multiplying numbers? What I think of: 20*.7 could be 14, while 20*.3 could be 5.999999. So, my sum might be 14 + 5 = 19. This is just my guess, however, I do not know whether these kind of results can or cannot occur (otherwise the answer would be simply that this kind of rounding proposition does not work)
2) Even if it does work... Why?
(I have in mind that I could just calculate number 1 by n * ratio and calculate number 2 by n - n * ratio, but I would still be interested in the answer to this question)
How about this?
int n = 20;
double ratio = .7;
int n1 = static_cast<int>(n * ratio);
int n2 = n - n1;
Here is example that confirms your suspicion and shows that the ceil+floor method doesn't always work. It is caused by the finite precision of floating point numbers on computer:
#include <iostream>
#include <cmath>
int main() {
int n = 10;
double ratio = 0.7;
int n1 = static_cast<int>(floor(n * ratio));
int n2 = static_cast<int>(ceil(n * (1.0 - ratio)));
std::cout << n1 << " " << n2 << std::endl;
}
Output:
7 4
7 + 4 is 11, so it's wrong.
Your solution doesn't always work, take a ratio of 77%, you'll get 15 and 4 (See on coliru).
Welcome to the domain of numerical analysis.
First, your computer can't always perfectly store a floating number. As you can see in the example, .77 is stored as 0.77000000000000001776 (it is an approach of the number by a sum of powers of 2).
When doing floating point calculation, you will always have a loss in precision. You can get this precision with std::numeric_limits<double>::epsilon().
Moreover, you'll still get more precision loss when converting from a floating number to an integer, and in your case the difference is big enough to give you an incoherent result.
The solution provided by #ToniBig and your last sentence has the advantage of "hiding" this loss and keep coherent data.

C++ Modulus returning wrong answer

Here is my code :
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
int n, i, num, m, k = 0;
cout << "Enter a number :\n";
cin >> num;
n = log10(num);
while (n > 0) {
i = pow(10, n);
m = num / i;
k = k + pow(m, 3);
num = num % i;
--n;
cout << m << endl;
cout << num << endl;
}
k = k + pow(num, 3);
return 0;
}
When I input 111 it gives me this
1
12
1
2
I am using codeblocks. I don't know what is wrong.
Whenever I use pow expecting an integer result, I add .5 so I use (int)(pow(10,m)+.5) instead of letting the compiler automatically convert pow(10,m) to an int.
I have read many places telling me others have done exhaustive tests of some of the situations in which I add that .5 and found zero cases where it makes a difference. But accurately identifying the conditions in which it isn't needed can be quite hard. Using it when it isn't needed does no real harm.
If it makes a difference, it is a difference you want. If it doesn't make a difference, it had a tiny cost.
In the posted code, I would adjust every call to pow that way, not just the one I used as an example.
There is no equally easy fix for your use of log10, but it may be subject to the same problem. Since you expect a non integer answer and want that non integer answer truncated down to an integer, adding .5 would be very wrong. So you may need to find some more complicated work around for the fundamental problem of working with floating point. I'm not certain, but assuming 32-bit integers, I think adding 1e-10 to the result of log10 before converting to int is both never enough to change log10(10^n-1) into log10(10^n) but always enough to correct the error that might have done the reverse.
pow does floating-point exponentiation.
Floating point functions and operations are inexact, you cannot ever rely on them to give you the exact value that they would appear to compute, unless you are an expert on the fine details of IEEE floating point representations and the guarantees given by your library functions.
(and furthermore, floating-point numbers might even be incapable of representing the integers you want exactly)
This is particularly problematic when you convert the result to an integer, because the result is truncated to zero: int x = 0.999999; sets x == 0, not x == 1. Even the tiniest error in the wrong direction completely spoils the result.
You could round to the nearest integer, but that has problems too; e.g. with sufficiently large numbers, your floating point numbers might not have enough precision to be near the result you want. Or if you do enough operations (or unstable operations) with the floating point numbers, the errors can accumulate to the point you get the wrong nearest integer.
If you want to do exact, integer arithmetic, then you should use functions that do so. e.g. write your own ipow function that computes integer exponentiation without any floating-point operations at all.

Can float values add to a sum of zero? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Most effective way for float and double comparison
I have two values(floats) I am attempting to add together and average. The issue I have is that occasionally these values would add up to zero, thus not requiring them to be averaged.
The situation I am in specifically contains the values "-1" and "1", yet when added together I am given the value "-1.19209e-007" which is clearly not 0. Any information on this?
I'm sorry but this doesn't make sense to me.
Two floating point values, if they are exactly the same but with opposite sign, subtracted will produce always 0. This is how floating point operations works.
float a = 0.2f;
float b = -0.2f;
float f = (a - b) / 2;
printf("%f %d\n", f, f != 0); // will print out 0.0000 0
Will be always 0 also if the compiler doesn't optimize the code.
There is not any kind of rounding error to take in account if a and b have the same value but opposite sign! That is, if the higher bit of a is 0 and the higher bit of b is 1 and all other bits are the same, the result cannot be other than 0.
But if a and b are slightly different, of course, the result can be non-zero.
One possible solution to avoid this can be using a tolerance...
float f = (a + b) / 2;
if (abs(f) < 0.000001f)
f = 0;
We are using a simple tolerance to see if our value is near to zero.
A nice example code to show this is...
int main(int argc)
{
for (int i = -10000000; i <= 10000000 * argc; ++i)
{
if (i != 0)
{
float a = 3.14159265f / i;
float b = -a + (argc - 1);
float f = (a + b) / 2;
if (f != 0)
printf("%f %d\n", a, f);
}
}
printf("completed\n");
return 0;
}
I'm using "argc" here as a trick to force the compiler to not optimize out our code.
At least right off, this sounds like typical floating point imprecision.
The usual way to deal with it is to round your numbers to the correct number of significant digits. In this case, your average would be -1.19209e-08 (i.e., 0.00000001192). To (say) six or seven significant digits, that is zero.
Takes the sum of all your numbers, divide by your count. Round off your answer to something reasonable before you do prints, reports comparisons, or whatever you're doing.
again, do some searching on this but here is the basic explanation ...
the computer approximates floating point numbers by base 2 instead of base 10. this means that , for example, 0.2 (when converted to binary) is actually 0.001100110011 ... on forever. since the computer cannot add these on forever, it must approximate it.
because of these approximations, we lose "precision" of calculations. hence "single" and "double" precision floating point numbers. this is why you never test for a float to be actually 0. instead, you test whether is below some threshhold which you want to use as zero.