prevent long running averaging from overflow? - c++

suppose I want to calculate average value of a data-set such as
class Averager {
float total;
size_t count;
float addData (float value) {
this->total += value;
return this->total / ++this->count;
}
}
sooner or later the total or count value will overflow, so I make it doesn't remember the total value by :
class Averager {
float currentAverage;
size_t count;
float addData (float value) {
this->currentAverage = (this->currentAverage*count + value) / ++count;
return this->currentAverage;
}
}
it seems they will overflow longer, but the multiplication between average and count lead to overflow problem, so next solution is:
class Averager {
float currentAverage;
size_t count;
float addData (float value) {
this->currentAverage += (value - this->currentAverage) / ++count;
return this->currentAverage;
}
}
seems better, next problem is how to prevent count from overflow?

Aggregated buckets.
We pick a bucket size that's comfortably less than squareRoot(MAXINT). To keep it simple, let's pick 10.
Each new value is added to the current bucket, and the moving average can be computed as you describe.
When the bucket is full start a new bucket, remembering the average of the full bucket. We can safely calculate the overall average by combining the averages of the full buckets and the current, partial bucket. When we get to 10 full buckets, we create a bigger bucket, capacity 100.
To compute the total average we first compute the average of the "10s" and then combine that with the "100s". This pattern repeats for "1,000s" "10,000s" and so on. At each stage we only need to consider two levels one 10 x bigger than the previous one.

Use double total; unsigned long long count;. You should still worry about accuracy, but it will be much less of a problem than with float.

What about using Arbitrary-precision arithmetic ?
There's a list of libraries you could use on Wikipedia: http://en.wikipedia.org/wiki/Bignum#Libraries
Most of Arbitrary-precision arithmetic libraries will not overflow until the number of digits stored fill the available memory (which is quite unlikely).

You want to use kahan's summation algorithm:
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
See also the section about errors in summation in
"What Every Computer Scientist Should Know About Floating-Point Arithmetic"
http://docs.sun.com/source/806-3568/ncg_goldberg.html#1262

You could use these special datatypes where integeres can grow infinitely until your RAM is full.

I was just thinking about this also. I think this solution works in terms of the new value 'moving the needle'. It only moves it by a factor of the number of previous values that contributed to the average-so-far (plus 1 for itself). It will lose accuracy as the inputs grow but on average should be practically acceptable.
Here's some Java code that seems to work. I used floats and ints here to demonstrate that it will work with those limitations but you could use double to gain accuracy. This is just to give you an idea of how to average an array of near-max integers. You would need to keep track of the total number of inputs and the current average, but not the total sum of the inputs. If your total number of inputs approaches MAX_INT, this eventually won't work and you should use the bucket suggestion above, but that is pretty drastic in most cases.
public float calcAverageContinuous(int[] integers)
{
float ave = 0;
for (int i = 0; i < integers.length; i++) {
ave += (((float)integers[i] - ave) / (float)(i + 1));
}
return ave;
}

Related

Make all elements of the array equal under given conditions

Given an array of size n. Each element denotes the work assigned to some student. Taking some amount of work from a student and assigning it some other student will increase it by a factor of k.
Now we have to redistribute the work such that each student will do equal work. Determine minimum possible work value. And round it off to two decimal places.
A={2,8} K= 1.5 You can take 2.4 from 8 and give it to 2. A={ 2+2.4*1.5 , 8-2.4} Answer is 5.60.
How can we approach to this question. For n=2, I am able to do this simply by solving equations. But for n>2, how can we approach to this. I tried binary search. But I am getting Time limit Exceeded.
sort(a.begin(),a.end());
double low=a[0];
double high= a[n-1];
double res=INT_MAX;
double mid;
int i,j;
while(low<=high){
mid=(low+high)/(2.0);
i=0, j=n-1;
while(i<=j){
if((a[i]+k*a[j])==(k+1)*mid){
i++;
j--;
}
else if((a[i]+k*a[j])<(k+1)*mid){
high=mid;
break;
}
else{
low=mid;
break;
}
}
if(i>j){
res=mid;
high=mid;
}
}
return res;
Please give me suggestions how I can overcome with Time limit exceeded.
If I'm understanding correctly you have an array of doubles and you want to make all elements the same but, the condition is when you subtract you subtract normal but when you add you add multiplied by 1.5, it's basically calculating the average with a condition, so you want make an average and see if its close enough or not so, I made new variable difference that see if the average that we used is bigger or less than the real one, note that difference not give any real value just positive or negative
sort(a.begin(), a.end());
double low = a[0];
double high = a[n - 1];
double average, difference= 0;
average = (low + high) / 2;
do
{
if (difference > 0)
{
low = average;
average = average + high / 2;
}
else if (difference<0)
{
high = average;
average = average + low / 2;
}
for (int i = 0, difference = 0; i < n; i++)
if (a[i] < average)
difference = difference + (a[i]-average) * 1.5;
else
difference = difference + a[i] - average;
} while (difference > 0.01 || difference < -0.01);
The while will done when difference is under 0.01 that's mean average is closer than 0.01 (you can say its 0.01/n almost).
I hope I answered your question, it's my first time to answer question here.
I'd start with some math.
If you graph the values, and you found the target value X, then you would have valleys below X and mountains above X. The volume of the mountains above X, times k, must equal the volume of the valleys.
If you sort the elements, then calculating the net mountain-valley volumes is a linear process. Starting at a_0, all there is is mountain; this is a function of the sum of the a_is and k.
Going from a_i to a_{i+1} makes the a_0 to a_i valleys deeper by a_{i+1}-a_i, and makes the a_{i+1} to a_{last} mountains shallower by the same amount.
In that region the valley/mountain transformation is linear, if it is within that region you can do linear math to find the zero.
Walk from least to greatest, and find the point where the valleys match the mountains.

summing array of doubles with large value span : proper algorithm

I have an algorithm where I need to sum (a lot of time) double numbers ranging in the e-40 to the e+40.
Array Example (randomly dumped from real application):
-2.06991e-05
7.58132e-06
-3.91367e-06
7.38921e-07
-5.33143e-09
-4.13195e-11
4.01724e-14
6.03221e-17
-4.4202e-20
6.58873
-1.22257
-0.0606178
0.00036508
2.67599e-07
0
-627.061
-59.048
5.92985
0.0885884
0.000276455
-2.02579e-07
It goes without saying the I am aware of the rounding effect this will cause, I am trying to keep it under control : the final result should not have any missing information in the fractional part of the double or, if not avoidable result should be at least n-digit accurate (with n defined). End result needs something like 5 digits plus exponent.
After some decent thinking, I ended up with following algorithm :
Sort the array so that the largest absolute value comes first, closest to zero last.
Add everything in a loop
The idea is that in this case, any cancellation of large values (negatives and positive) will not impact latter smaller values.
In short :
(10e40 - 10e40) + 1 = 1 : result is as expected
(1 + 10e-40) - 10e40 = 0 : not good
I ended up using std::multiset (benchmark on my PC gave 20% higher speed with long double compared to normal doubles - I am fine with doubles resolution) with a custom sort function using std:fabs.
It's still quite slow (it takes 5 seconds to do the whole thing) and I still have this feeling of "you missed something in your algo". Any recommandation :
for speed optimization. Is there a better way to sort the intermediate products ? Sorting a set of 40 intermediate results (typically) takes about 70% of the total execution time.
for missed issues. Is there a chance to still lose critical data (one that should have been in the fractional part of the final result) ?
On a bigger picture, I am implementing real coefficient polynomial classes of pure imaginary variable (electrical impedances : Z(jw)). Z is a big polynom representing a user defined system, with coefficient exponent ranging very far.
The "big" comes from adding things like Zc1 = 1/jC1w to Zc2 = 1/jC2w :
Zc1 + Zc2 = (C1C2(jw)^2 + 0(jw))/(C1+C2)(jw)
In this case, with C1 and C2 in nanofarad (10e-9), C1C2 is already in 10e-18 (and it only started...)
my sort function use a manhattan distance of complex variables (because, mine are either pure real or pure imaginary) :
struct manhattan_complex_distance
{
bool operator() (std::complex<long double> a, std::complex<long double> b)
{
return std::fabs(std::real(a) + std::imag(a)) > std::fabs(std::real(b) + std::imag(b));
}
};
and my multi set in action :
std:complex<long double> get_value(std::vector<std::complex<long double>>& frequency_vector)
{
//frequency_vector is precalculated once for all to have at index n the value (jw)^n.
std::multiset<std::complex<long double>, manhattan_distance> temp_list;
for (int i=0; i<m_coeficients.size(); ++i)
{
// element of : ℝ * ℂ
temp_list.insert(m_coeficients[i] * frequency_vector[i]);
}
std::complex<long double> ret=0;
for (auto i:temp_list)
{
// it is VERY important to start adding the big values before adding the small ones.
// in informatics, 10^60 - 10^60 + 1 = 1; while 1 + 10^60 - 10^60 = 0. Of course you'd expected to get 1, not 0.
ret += i;
}
return ret;
}
The project I have is c++11 enabled (mainly for improvement of the math lib and complex number tools)
ps : I refactored the code to make is easy to read, in reality all complexes and long double names are template : I can change the polynomial type in no time or use the class for regular polynomial of ℝ
As GuyGreer suggested, you can use Kahan summation:
double sum = 0.0;
double c = 0.0;
for (double value : values) {
double y = value - c;
double t = sum + y;
c = (t - sum) - y;
sum = t;
}
EDIT: You should also consider using Horner's method to evaluate the polynomial.
double value = coeffs[degree];
for (auto i = degree; i-- > 0;) {
value *= x;
value += coeffs[i];
}
Sorting the data is on the right track. But you definitely should be summing from smallest magnitude to largest, not from largest to smallest. Summing from largest to smallest, by the time you get to the smallest, aligning the next value with the current sum is liable to cause most or all of the bits of the next value to 'fall off the end'. Summing instead from smallest to largest, the smallest values get a chance to accumulate a decent-sized sum, for which more bits will get into the largest. Combined with Kahan summation, that should yield a fairly accurate sum.
First: have your math keep track of error. Replace your doubles with error-aware types, and when you add or multiply together two doubles it also calculates the maximium error.
This is about the only way you can guarantee that your code produces accurate results while being reasonably fast.
Second, don't use a multiset. The associative containers are not for sorting, they are for maintaining a sorted collection, while being able to incrementally add or remove elements from it efficiently.
The ability to add/remove elements incrementally means it is node-based, and node-based means it is slow in general.
If you simply want a sorted collection, start with a vector then std::sort it.
Next, to minimize error, keep a list of positive and negative elements. Start with zero as your sum. Now pick the smallest of either the positive or negative elements such that the total of your sum and that element is closest to zero.
Do so with elements that calculate their error bounds.
At the end, determine if you have 5 digits of precision, or not.
These error-propogating doubles should be ideally used as early on in the algorithm as possible.

How to calculate the sum of the bitwise xor values of all the distinct combination of the given numbers efficiently?

Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
{
int n, i, sum, j;
scanf("%d", &n);
sum=0;
for(i=0;i<n;i++)
scanf("%d", &num[i]);
for(i=0;i<n-1;i++)
{
for(j=i+1;j<n;j++)
{
sum+=(num[i]^num[j]);
}
}
printf("%d\n", sum);
return 0;
}
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
{
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
}
return res;
}
Live Demo.
Explanation:
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).

What's the numerically best way to calculate the average

what's the best way to calculate the average? With this question I want to know which algorithm for calculating the average is the best in a numerical sense. It should have the least rounding errors, should not be sensitive to over- or underflows and so on.
Thank you.
Additional information: incremental approaches preferred since the number of values may not fit into RAM (several parallel calculations on files larger than 4 GB).
If you want an O(N) algorithm, look at Kahan summation.
You can have a look at http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3535 (Nick Higham, "The accuracy of floating point summation", SIAM Journal of Scientific Computation, 1993).
If I remember it correctly, compensated summation (Kahan summation) is good if all numbers are positive, as least as good as sorting them and adding them in ascending order (unless there are very very many numbers). The story is much more complicated if some numbers are positive and some are negative, so that you get cancellation. In that case, there is an argument for adding them in descending order.
Sort the numbers in ascending order of magnitude. Sum them, low magnitude first. Divide by the count.
I always use the following pseudocode:
float mean=0.0; // could use doulbe
int n=0; // could use long
for each x in data:
++n;
mean+=(x-mean)/n;
I don't have formal proofs of its stability but you can see that we won't have problems with numerical overflow, assuming that the data values are well behaved. It's referred to in Knuth's The Art of Computer Programming
Just to add one possible answer for further discussion:
Incrementally calculate the average for each step:
AVG_n = AVG_(n-1) * (n-1)/n + VALUE_n / n
or pairwise combination
AVG_(n_a + n_b) = (n_a * AVG_a + n_b * AVG_b) / (n_a + n_b)
(I hope the formulas are clear enough)
A very late post, but since I don't have enough reputation to comment, #Dave's method is the one used (as at December 2020) by the Gnu Scientific Library.
Here is the code, extracted from mean_source.c:
double FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* Compute the arithmetic mean of a dataset using the recurrence relation mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1) */
long double mean = 0;
size_t i;
for (i = 0; i < size; i++)
{
mean += (data[i * stride] - mean) / (i + 1);
}
return mean;
}
GSL uses the same algorithm to calculate the variance, which is, after all, just a mean of squared differences from a given number.

In which order should floats be added to get the most precise result?

This was a question I was asked at my recent interview and I want to know (I don't actually remember the theory of the numerical analysis, so please help me :)
If we have some function, which accumulates floating-point numbers:
std::accumulate(v.begin(), v.end(), 0.0);
v is a std::vector<float>, for example.
Would it be better to sort these numbers before accumulating them?
Which order would give the most precise answer?
I suspect that sorting the numbers in ascending order would actually make the numerical error less, but unfortunately I can't prove it myself.
P.S. I do realize this probably has nothing to do with real world programming, just being curious.
Your instinct is basically right, sorting in ascending order (of magnitude) usually improves things somewhat. Consider the case where we're adding single-precision (32 bit) floats, and there are 1 billion values equal to 1 / (1 billion), and one value equal to 1. If the 1 comes first, then the sum will come to 1, since 1 + (1 / 1 billion) is 1 due to loss of precision. Each addition has no effect at all on the total.
If the small values come first, they will at least sum to something, although even then I have 2^30 of them, whereas after 2^25 or so I'm back in the situation where each one individually isn't affecting the total any more. So I'm still going to need more tricks.
That's an extreme case, but in general adding two values of similar magnitude is more accurate than adding two values of very different magnitudes, since you "discard" fewer bits of precision in the smaller value that way. By sorting the numbers, you group values of similar magnitude together, and by adding them in ascending order you give the small values a "chance" of cumulatively reaching the magnitude of the bigger numbers.
Still, if negative numbers are involved it's easy to "outwit" this approach. Consider three values to sum, {1, -1, 1 billionth}. The arithmetically correct sum is 1 billionth, but if my first addition involves the tiny value then my final sum will be 0. Of the 6 possible orders, only 2 are "correct" - {1, -1, 1 billionth} and {-1, 1, 1 billionth}. All 6 orders give results that are accurate at the scale of the largest-magnitude value in the input (0.0000001% out), but for 4 of them the result is inaccurate at the scale of the true solution (100% out). The particular problem you're solving will tell you whether the former is good enough or not.
In fact, you can play a lot more tricks than just adding them in sorted order. If you have lots of very small values, a middle number of middling values, and a small number of large values, then it might be most accurate to first add up all the small ones, then separately total the middling ones, add those two totals together then add the large ones. It's not at all trivial to find the most accurate combination of floating-point additions, but to cope with really bad cases you can keep a whole array of running totals at different magnitudes, add each new value to the total that best matches its magnitude, and when a running total starts to get too big for its magnitude, add it into the next total up and start a new one. Taken to its logical extreme, this process is equivalent to performing the sum in an arbitrary-precision type (so you'd do that). But given the simplistic choice of adding in ascending or descending order of magnitude, ascending is the better bet.
It does have some relation to real-world programming, since there are some cases where your calculation can go very badly wrong if you accidentally chop off a "heavy" tail consisting of a large number of values each of which is too small to individually affect the sum, or if you throw away too much precision from a lot of small values that individually only affect the last few bits of the sum. In cases where the tail is negligible anyway you probably don't care. For example if you're only adding together a small number of values in the first place and you're only using a few significant figures of the sum.
There is also an algorithm designed for this kind of accumulation operation, called Kahan Summation, that you should probably be aware of.
According to Wikipedia,
The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).
In pseudocode, the algorithm is:
function kahanSum(input)
var sum = input[1]
var c = 0.0 //A running compensation for lost low-order bits.
for i = 2 to input.length
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
next i //Next time around, the lost low part will be added to y in a fresh attempt.
return sum
I tried out the extreme example in the answer supplied by Steve Jessop.
#include <iostream>
#include <iomanip>
#include <cmath>
int main()
{
long billion = 1000000000;
double big = 1.0;
double small = 1e-9;
double expected = 2.0;
double sum = big;
for (long i = 0; i < billion; ++i)
sum += small;
std::cout << std::scientific << std::setprecision(1) << big << " + " << billion << " * " << small << " = " <<
std::fixed << std::setprecision(15) << sum <<
" (difference = " << std::fabs(expected - sum) << ")" << std::endl;
sum = 0;
for (long i = 0; i < billion; ++i)
sum += small;
sum += big;
std::cout << std::scientific << std::setprecision(1) << billion << " * " << small << " + " << big << " = " <<
std::fixed << std::setprecision(15) << sum <<
" (difference = " << std::fabs(expected - sum) << ")" << std::endl;
return 0;
}
I got the following result:
1.0e+00 + 1000000000 * 1.0e-09 = 2.000000082740371 (difference = 0.000000082740371)
1000000000 * 1.0e-09 + 1.0e+00 = 1.999999992539933 (difference = 0.000000007460067)
The error in the first line is more than ten times bigger in the second.
If I change the doubles to floats in the code above, I get:
1.0e+00 + 1000000000 * 1.0e-09 = 1.000000000000000 (difference = 1.000000000000000)
1000000000 * 1.0e-09 + 1.0e+00 = 1.031250000000000 (difference = 0.968750000000000)
Neither answer is even close to 2.0 (but the second is slightly closer).
Using the Kahan summation (with doubles) as described by Daniel Pryden:
#include <iostream>
#include <iomanip>
#include <cmath>
int main()
{
long billion = 1000000000;
double big = 1.0;
double small = 1e-9;
double expected = 2.0;
double sum = big;
double c = 0.0;
for (long i = 0; i < billion; ++i) {
double y = small - c;
double t = sum + y;
c = (t - sum) - y;
sum = t;
}
std::cout << "Kahan sum = " << std::fixed << std::setprecision(15) << sum <<
" (difference = " << std::fabs(expected - sum) << ")" << std::endl;
return 0;
}
I get exactly 2.0:
Kahan sum = 2.000000000000000 (difference = 0.000000000000000)
And even if I change the doubles to floats in the code above, I get:
Kahan sum = 2.000000000000000 (difference = 0.000000000000000)
It would seem that Kahan is the way to go!
There is a class of algorithms that solve this exact problem, without the need to sort or otherwise re-order the data.
In other words, the summation can be done in one pass over the data. This also makes such algorithms applicable in situations where the dataset is not known in advance, e.g. if the data arrives in real time and the running sum needs to be maintained.
Here is the abstract of a recent paper:
We present a novel, online algorithm for exact summation of a stream
of floating-point numbers. By “online” we mean that the algorithm
needs to see only one input at a time, and can take an arbitrary
length input stream of such inputs while requiring only constant
memory. By “exact” we mean that the sum of the internal array of our
algorithm is exactly equal to the sum of all the inputs, and the
returned result is the correctly-rounded sum. The proof of correctness
is valid for all inputs (including nonnormalized numbers but modulo
intermediate overflow), and is independent of the number of summands
or the condition number of the sum. The algorithm asymptotically needs
only 5 FLOPs per summand, and due to instruction-level parallelism
runs only about 2--3 times slower than the obvious, fast-but-dumb
“ordinary recursive summation” loop when the number of summands is
greater than 10,000. Thus, to our knowledge, it is the fastest, most
accurate, and most memory efficient among known algorithms. Indeed, it
is difficult to see how a faster algorithm or one requiring
significantly fewer FLOPs could exist without hardware improvements.
An application for a large number of summands is provided.
Source: Algorithm 908: Online Exact Summation of Floating-Point Streams.
Building on Steve's answer of first sorting the numbers in ascending order, I'd introduce two more ideas:
Decide on the difference in exponent of two numbers above which you might decide that you would lose too much precision.
Then add the numbers up in order until the exponent of the accumulator is too large for the next number, then put the accumulator onto a temporary queue and start the accumulator with the next number. Continue until you exhaust the original list.
You repeat the process with the temporary queue (having sorted it) and with a possibly larger difference in exponent.
I think this will be quite slow if you have to calculate exponents all the time.
I had a quick go with a program and the result was 1.99903
I think you can do better than sorting the numbers before you accumulate them, because during the process of accumulation, the accumulator gets bigger and bigger. If you have a large amount of similar numbers, you will start to lose precision quickly. Here is what I would suggest instead:
while the list has multiple elements
remove the two smallest elements from the list
add them and put the result back in
the single element in the list is the result
Of course this algorithm will be most efficient with a priority queue instead of a list. C++ code:
template <typename Queue>
void reduce(Queue& queue)
{
typedef typename Queue::value_type vt;
while (queue.size() > 1)
{
vt x = queue.top();
queue.pop();
vt y = queue.top();
queue.pop();
queue.push(x + y);
}
}
driver:
#include <iterator>
#include <queue>
template <typename Iterator>
typename std::iterator_traits<Iterator>::value_type
reduce(Iterator begin, Iterator end)
{
typedef typename std::iterator_traits<Iterator>::value_type vt;
std::priority_queue<vt> positive_queue;
positive_queue.push(0);
std::priority_queue<vt> negative_queue;
negative_queue.push(0);
for (; begin != end; ++begin)
{
vt x = *begin;
if (x < 0)
{
negative_queue.push(x);
}
else
{
positive_queue.push(-x);
}
}
reduce(positive_queue);
reduce(negative_queue);
return negative_queue.top() - positive_queue.top();
}
The numbers in the queue are negative because top yields the largest number, but we want the smallest. I could have provided more template arguments to the queue, but this approach seems simpler.
This doesn't quite answer your question, but a clever thing to do is to run the sum twice, once with rounding mode "round up" and once with "round down". Compare the two answers, and you know /how/ inaccurate your results are, and if you therefore need to use a cleverer summing strategy. Unfortunately, most languages don't make changing the floating point rounding mode as easy as it should be, because people don't know that it's actually useful in everyday calculations.
Take a look at Interval arithmetic where you do all maths like this, keeping highest and lowest values as you go. It leads to some interesting results and optimisations.
The simplest sort that improves accuracy is to sort by the ascending absolute value. That lets the smallest magnitude values have a chance to accumulate or cancel before interacting with larger magnitude values that have would trigger a loss of precision.
That said, you can do better by tracking multiple non-overlapping partial sums. Here is a paper describing the technique and presenting a proof-of-accuracy: www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
That algorithm and other approaches to exact floating point summation are implemented in simple Python at: http://code.activestate.com/recipes/393090/ At least two of those can be trivially converted to C++.
For IEEE 754 single or double precision or known format numbers, another alternative is to use an array of numbers (passed by caller, or in a class for C++) indexed by the exponent. When adding numbers into the array, only numbers with the same exponent are added (until an empty slot is found and the number stored). When a sum is called for, the array is summed from smallest to largest to minimize truncation. Single precision example:
/* clear array */
void clearsum(float asum[256])
{
size_t i;
for(i = 0; i < 256; i++)
asum[i] = 0.f;
}
/* add a number into array */
void addtosum(float f, float asum[256])
{
size_t i;
while(1){
/* i = exponent of f */
i = ((size_t)((*(unsigned int *)&f)>>23))&0xff;
if(i == 0xff){ /* max exponent, could be overflow */
asum[i] += f;
return;
}
if(asum[i] == 0.f){ /* if empty slot store f */
asum[i] = f;
return;
}
f += asum[i]; /* else add slot to f, clear slot */
asum[i] = 0.f; /* and continue until empty slot */
}
}
/* return sum from array */
float returnsum(float asum[256])
{
float sum = 0.f;
size_t i;
for(i = 0; i < 256; i++)
sum += asum[i];
return sum;
}
double precision example:
/* clear array */
void clearsum(double asum[2048])
{
size_t i;
for(i = 0; i < 2048; i++)
asum[i] = 0.;
}
/* add a number into array */
void addtosum(double d, double asum[2048])
{
size_t i;
while(1){
/* i = exponent of d */
i = ((size_t)((*(unsigned long long *)&d)>>52))&0x7ff;
if(i == 0x7ff){ /* max exponent, could be overflow */
asum[i] += d;
return;
}
if(asum[i] == 0.){ /* if empty slot store d */
asum[i] = d;
return;
}
d += asum[i]; /* else add slot to d, clear slot */
asum[i] = 0.; /* and continue until empty slot */
}
}
/* return sum from array */
double returnsum(double asum[2048])
{
double sum = 0.;
size_t i;
for(i = 0; i < 2048; i++)
sum += asum[i];
return sum;
}
Your floats should be added in double precision. That will give you more additional precision than any other technique can. For a bit more precision and significantly more speed, you can create say four sums, and add them up at the end.
If you are adding double precision numbers, use long double for the sum - however, this will only have a positive effect in implementations where long double actually has more precision than double (typically x86, PowerPC depending on compiler settings).
Regarding sorting, it seems to me that if you expect cancellation then the numbers should be added in descending order of magnitude, not ascending. For instance:
((-1 + 1) + 1e-20) will give 1e-20
but
((1e-20 + 1) - 1) will give 0
In the first equation that two large numbers are cancelled out, whereas in the second the 1e-20 term gets lost when added to 1, since there is not enough precision to retain it.
Also, pairwise summation is pretty decent for summing lots of numbers.