Dividing two integer without casting to double - c++

I have two integer variables, partial and total. It is a progress, so partial starts at zero and goes up one-by-one to the value of total.
If I want to get a fraction value indicating the progress(from 0.0 to 1.0) I may do the following:
double fraction = double(partial)/double(total);
But if total is too big, the conversion to double may lose information.
Actually, the amount of lost information is tolerable, but I was wondering if there is a algorithm or a std function to get the fraction between two values losing less information.

The obvious answer is to multiply partial by some scaling factor;
100 is a frequent choice, since the division then gives the percent as
an integral value (rounded down). The problem is that if the values are
so large that they can't be represented precisely in a double, there's
also a good chance that the multiplication by the scaling factor will
overflow. (For that matter, if they're that big, the initial values
will overflow an int on most machines.)

Yes, there is an algorithm losing less information. Assuming you want to find the double value closest to the mathematical value of the fraction, you need an integer type capable of holding total << 53. You can create your own or use a library like GMP for that. Then
scale partial so that (total << 52) <= numerator < (total << 53), where numerator = (partial << m)
let q be the integer quotient numerator / total and r = numerator % total
let mantissa = q if 2*r < total, = q+1 if 2*r > total and if 2*r == total, mantissa = q+1 if you want to round half up, = q if you want to round half down, the even of the two if you want round-half-to-even
result = scalbn(mantissa, -m)
Most of the time you get the same value as for (double)partial / (double)total, differences of one least significant bit are probably not too rare, two or three LSB difference wouldn't surprise me either, but are rare, a bigger difference is unlikely (that said, somebody will probably give an example soon).
Now, is it worth the effort? Usually not.

If you want a precise representation of the fraction, you'd have some sort of structure containing the numerator and the denominator as integers, and, for unique representation, you'd just factor out the greatest common divisor (with a special case for zero). If you are just worried that after repeated operations the floating point representation might not be accurate enough, you need to just find some courses on numerical analysisas that issue isn't strictly a programming issue. There are better ways than others to calculate certain results, but I can't really go into them (I've never done the coursework, just read about it).

Related

Looking for the fastest way to divide by 2

I've searched half the day and found some very interesting things about using fixed point data types and bit shifting in C++ to accomplish division operations while avoiding floating point math. However, I have only been able to understand a small fraction of it and I can't seem to get anything to work.
All I'm wanting to do is to take two integers, ad them up, and divide by two to get the average. I need to be able to do this very quickly though, since I'm interpolating camera pixel data on an Arduino and I also have other operations to do.
So I'm confused about shifting in general. Say the integer I want to divide by two is 27. Half of 27 is 13.5. But no matter what fixed point datatype I try, I can only get 13 as an output. For example:
uint8_t x = 27;
Serial.println( x >> 1 );
returns 13
There's got to be some simple way to do this, right?
Fixed point does give you a way to represent 13.5. The Wikipedia article on the Q number format is informative: https://en.wikipedia.org/wiki/Q_(number_format)
Think of it this way: You keep using integers, but instead of taking them at face value, divide them all implicitly by a power of 2 to obtain their semantic value.
So, if using an unsigned byte as your base type (values between 0 and 255, inclusive), you might implicitly divide by 2**3 (8). Now to represent 27, you need an integer set to 27*8=>216. To divide by two, you shift one to the right; now your integer is 108, which when divided by the implicit denominator of 8 gives 13.5, the value you're expecting.
You have to realize that fixed-point number systems (and floating point too, though it's less immediately evident) still have limits, of course; certain operations will overflow no matter what you do, and some operations cause a loss of precision. This is a normal consequence of working with limited-size types.
Say the integer I want to divide by two is 27. Half of 27 is 13.5. But
no matter what fixed point data type I try, I can only get 13 as an
output.
From wikipedia Fixed-Point Arithmetic:
The scaling factor is usually a power of 10 (for human convenience) or
a power of 2 (for computational efficiency).
You actually mentioned fixed point data type, and I think that is the best approach. But no matter what you tried? Perhaps we have different understandings of fixed-point-arithmetic.
while avoiding floating point math.
Another worth while goal, though reducing in value. Even in embedded systems, I seldom had to deal with a processor that did not have floating point parts. Floating point hardware has gotten reasonably good.
Any way, using fixed point avoids any need for floating point. Even for display purposes.
I think I need to proceed with a few examples.
Fixed point Example 1: Dollars and pennies
The unit of American money is based on the dollar. The Dollar is a fixed point data type.
So, if you have 27 dollars, how do you split it with your sibling?
One way (of several) that you all know is to convert 27 dollars into 2700 pennies. Dividing this value by 2 is trivial. Now you and your sibling can each get 1350 pennies. (i.e. the penny is a fixed point data type, that easily converts to/from dollars, and vice-vesa)
Note that this is completely integer arithmetic. Adding 2 integers, and dividing by 2 (any modern compiler will choose the fastest implementation.. either integer divide or perhaps right-shift-by-2) and on my desktop these 2 actions take less than a microsecond to complete.
You should waste no more time on measuring the relative performance of those two options (divide vs right-shift), you simply enable -O3 when your code tests correct. Your compiler should be able to choose correctly.
The choice of units in any problem is based on a scale factor that covers the range of values (in your problem) AND the understandable and quickly implemented conversion between units. And note that uint64_t can describe a large amount of cash, even in pennies. (challenge to the student.)
In General, about fixed point:
Given
uint8_t x = 27;
and the desire to divide by 2 evenly and quickly... can any scale factor be something that serves your needs? I say yes.
example 2 - 50 cent coins and a dollar
How about we try, for example, a simple scale factor of 2, i.e. the unit is a hu, or half unit. (analogous to the 50-cent-coin)
uint8_t x = 27 * 1/hu; (hu = 1/2)
This means that 54 hu represents 27 units. (ie, it takes 54 50-cent-coins to add up to 27 dollars)
The fixed point solution is to scale your integer values to achieve the arithmetic required. If you scale to even values, all your integers will divide evenly to the hu units.
example 3 - nickles and a dollar
Another possible scale might be 20, both decimal (for readability) and binary for performance. (note that there are 20 nickels in a dollar)
uint16 x = 27 * 1/tu; (tu = 1/20)
Now 540 represents a scaled 27. i.e. 540 nickles
All examples are fully integer, provide exact answers, and there is a trivial mechanism to convert the values for presentation to the user. i.e. which ever fixed point used, convert to analogue of pennies, and thus 1350 pennies.
Display the penny count as a dollar
std::cout << (pennyCount / 100) << "." << (pennyCount % 100) << std::endl;
I think this should look something like (untested)
13.50
Now your challenge is to make it look nice on the output.
The reason you only get 13 is because you are actually cutting off the least significant bits when you bit shift. Since you are cutting them off, there is no remainder to check. If you are interested in what your remainder is, you could do something like:
uint8_t x = 27;
Serial.println((x - (x >> 1) - (x >> 1));
(x - (x >> 1)) should give 14 here.
it would be pretty simple to add .5 to a number once you determine whether the remainder is 1.
The following should work and should be fast:
float y = (x >> 1) + (0.5 * (x & 0x01))
What it does
(x >> 1) Divide by 2 using the bit shift
(0.5 * (x & 0x01)) Add 0.5 if the last bit was 1 (odd number)

What's the best multiplication algorithm for fixed point where precision is necessary

I know, I know, people are probably going to say "just switch to floating point", but currently that is not an option due to the nature of the project that I am working on. I am helping write a programming language in C++ and I am currently having difficulty trying to get a very accurate algorithm for multiplication whereby I have a VM and mainly the operations for mod/smod, div/sdiv (ie signed numbers are not a concern here), mul, a halving number for fully fractional numbers and a pushed shift number that I multiply and divide by to create my shifting. For simplicity, lets say I'm working with a 32 byte space. My algorithms work fine for pretty much anything involving integers, it's just that when my fractional portion gets over 16 bytes that I run into problems with precision, and if I were to round it, the number would be fairly accurate, but I want it as accurate as possible, even willing to sacrifice a tad in performance for it, so long as it stays a fixed point and doesn't go into floating point land. The algorithms I'm concerned with I will map out in a sort of pseudocode. Would love any insight into how I could make this better, or any reasoning as to why by the laws of computational science, what I'm asking for is a fruitless endeavor.
For fully fractional numbers (all bytes are fractional):
A = num1 / halfShift //truncates the number down to 16 so that when multiplied, we get a full 32 byte num
B = num2 / halfShift
finalNum = A * B
For the rest of my numbers that are larger than 16 bytes I use this algorithm:
this algorithm can essentially be broken down into the int.frac form
essentially A.B * C.D taking the mathematic form of
D*B/shift + C*A*shift + D*A + C*B
if the fractional numbers are larger than the integer, I halve them, then multiply them together in my D*B/shift
just like in the fully fractional example above
Is there some kind of "magic" rounding method that I should be aware of? Please let me know.
You get the most accurate result if you do the multiplication first and scale afterwards. Of course that means, that you need to store the result of the multiplication in a 64-bit int type.
If that is not an option, your approach with shifting in advance makes sense. But you certainly loose precision.
Either way, you can increase accuracy a little if you round instead of truncate.
I support Aconcagua's recommendation to round to nearest.
For that you need to add the highest bit which is going to be truncated before you apply the division.
In your case that would look like this:
A = (num1 + 1<<(halfshift-1)) >> halfshift
B = (num2 + 1<<(halfshift-1)) >> halfshift
finalNum = A * B
EDIT:
Example on how to dynamically scale the factors and the result depending on the values of the factors (this improves resolution and therefore the accuracy of the result):
shiftA and shiftB need to be set such that A and B are 16 byte fractionals each and therefore the 32 byte result cannot overflow. If shiftA and shiftB is not known in advance, it can be determined by counting the leading zeros of num1 and num2.
A = (num1 + 1<<(shiftA-1)) >> shiftA
B = (num2 + 1<<(shiftB-1)) >> shiftB
finalNum = (A * B) >> (fullshift - (shiftA + shiftB))
The number of fractional digits of a product equals the sum of the numbers of fractional digits in the operands. You have to carry out the multiplication to that precision and then round or truncate according to the desired target precision.

Inconsistencies with double data type in C++

This may be something really simple that I'm just missing, however I am having trouble using the double data type. It states here that a double in C++ is accurate to ~15 digits. Yet in the code:
double n = pow(2,1000);
cout << fixed << setprecision(0) << n << endl;
n stores the exact value of 2^1000, something that is 302 decimal digits long according to WolframAlpha. Yet, when I attempt to compute 100!, with the function:
double factorial(int number)
{
double product = 1.0;
for (int i = 1; i <= number; i++){product*=i;}
return product;
}
Inaccuracies begin to appear at the 16th digit. Can someone please explain this behaviour, as well as provide me with some kind of solution.
Thanks
You will need eventually to read the basics in What Every Computer Scientist Should Know About Floating-Point Arithmetic. The most common standard for floating-point hardware is generally IEEE 754; the current version is from 2008, but most CPUs do not implement the new decimal arithmetic featured in the 2008 edition.
However, a floating point number (double or float) stores an approximation to the value, using a fixed-size mantissa (fractional part) and an exponent (power of 2). The dynamic range of the exponents is such that the values that can be represented range from about 10-300 to 10+300 in decimal, with about 16 significant (decimal) digits of accuracy. People persistently try to print more digits than can be stored, and get interesting and machine-dependent (and library-dependent) results when they do.
So, what you are seeing is intrinsic to the nature of floating-point arithmetic on computers using fixed-precision values. If you need greater precision, you need to use one of the arbitrary-precision libraries - there are many of them available.
The double actually stores an exponent to be applied to 2 in it's internal representation. So of course it can store 2^1000 accurately. But try adding 1 to that.
IEEE 754 gives an algorithm for storing floating point data. Computers have a finite number of bits to store an infinite number of numbers, and thus introduces error when storing digits.
In this form of representation, there is less room for precision the larger the represented number gets (larger == absolute distance from zero). Probably at that point you are seeing the loss of precision, and as you get even larger numbers, it will have even larger loss of precision.

Representing probability in C++

I'm trying to represent a simple set of 3 probabilities in C++. For example:
a = 0.1
b = 0.2
c = 0.7
(As far as I know probabilities must add up to 1)
My problem is that when I try to represent 0.7 in C++ as a float I end up with 0.69999999, which won't help when I am doing my calculations later. The same for 0.8, 0.80000001.
Is there a better way of representing numbers between 0.0 and 1.0 in C++?
Bear in mind that this relates to how the numbers are stored in memory so that when it comes to doing tests on the values they are correct, I'm not concerned with how they are display/printed out.
This has nothing to do with C++ and everything to do with how floating point numbers are represented in memory. You should never use the equality operator to compare floating point values, see here for better methods: http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
My problem is that when I try to
represent 0.7 in C++ as a float I end
up with 0.69999999, which won't help
when I am doing my calculations later.
The same for 0.8, 0.80000001.
Is it really a problem? If you just need more precision, use a double instead of a float. That should get you about 15 digits precision, more than enough for most work.
Consider your source data. Is 0.7 really significantly more correct than 0.69999999?
If so, you could use a rational number library such as:
http://www.boost.org/doc/libs/1_40_0/libs/rational/index.html
If the problem is that probabilities add up to 1 by definition, then store them as a collection of numbers, omitting the last one. Infer the last value by subtracting the sum of the others from 1.
How much precision do you need? You might consider scaling the values and quantizing them in a fixed-point representation.
The tests you want to do with your numbers will be incorrect.
There is no exact floating point representation in a base-2 number system for a number like 0.1, because it is a infinte periodic number. Consider one third, that is exactly representable as 0.1 in a base-3 system, but 0.333... in the base-10 system.
So any test you do with a number 0.1 in floating point will be prone to be flawed.
A solution would be using rational numbers (boost has a rational lib), which will be always exact for, ermm, rationals, or use a selfmade base-10 system by multiplying the numbers with a power of ten.
If you really need the precision, and are sticking with rational numbers, I suppose you could go with a fixed point arithemtic. I've not done this before so I can't recommend any libraries.
Alternatively, you can set a threshold when comparing fp numbers, but you'd have to err on one side or another -- say
bool fp_cmp(float a, float b) {
return (a < b + epsilon);
}
Note that excess precision is automatically truncated in each calculation, so you should take care when operating at many different orders of magnitude in your algorithm. A contrived example to illustrate:
a = 15434355e10 + 22543634e10
b = a / 1e20 + 1.1534634
c = b * 1e20
versus
c = b + 1.1534634e20
The two results will be very different. Using the first method a lot of the precision of the first two numbers will be lost in the divide by 1e20. Assuming that the final value you want is on the order of 1e20, the second method will give you more precision.
If you only need a few digits of precision then just use an integer. If you need better precision then you'll have to look to different libraries that provide guarantees on precision.
The issue here is that floating point numbers are stored in base 2. You can not exactly represent a decimal in base 10 with a floating point number in base 2.
Lets step back a second. What does .1 mean? Or .7? They mean 1x10-1 and 7x10-1. If you're using binary for your number, instead of base 10 as we normally do, .1 means 1x2-1, or 1/2. .11 means 1x2-1 + 1x2-2, or 1/2+1/4, or 3/4.
Note how in this system, the denominator is always a power of 2. You cannot represent a number without a denominator that is a power of 2 in a finite number of digits. For instance, .1 (in decimal) means 1/10, but in binary that is an infinite repeating fraction, 0.000110011... (with the 0011 pattern repeating forever). This is similar to how in base 10, 1/3 is an infinite fraction, 0.3333....; base 10 can only represent numbers exactly with a denominator that is a multiple of powers of 2 and 5. (As an aside, base 12 and base 60 are actually really convenient bases, since 12 is divisible by 2, 3, and 4, and 60 is divisible by 2, 3, 4, and 5; but for some reason we use decimal anyhow, and we use binary in computers).
Since floating point numbers (or fixed point numbers) always have a finite number of digits, they cannot represent these infinite repeating fractions exactly. So, they either truncate or round the values to be as close as possible to the real value, but are not equal to the real value exactly. Once you start adding up these rounded values, you start getting more error. In decimal, if your representation of 1/3 is .333, then three copies of that will add up to .999, not 1.
There are four possible solutions. If all you care about is exactly representing decimal fractions like .1 and .7 (as in, you don't care that 1/3 will have the same problem you mention), then you can represent your numbers as decimal, for instance using binary coded decimal, and manipulate those. This is a common solution in finance, where many operations are defined in terms of decimal. This has the downside that you will need to implement all of your own arithmetic operations yourself, without the benefits of the computer's FPU, or find a decimal arithmetic library. This also, as mentioned, does not help with fractions that can't be represented exactly in decimal.
Another solution is to use fractions to represent your numbers. If you use fractions, with bignums (arbitrarily large numbers) for your numerators and denominators, you can represent any rational number that will fit in the memory of your computer. Again, the downside is that arithmetic will be slower, and you'll need to implement arithmetic yourself or use an existing library. This will solve your problem for all rational numbers, but if you wind up with a probability that is computed based on π or √2, you will still have the same issues with not being able to represent them exactly, and need to also use one of the later solutions.
A third solution, if all you care about is getting your numbers to add up to 1 exactly, is for events where you have n possibilities, to only store the values of n-1 of those probabilities, and compute the probability of the last as 1 minus the sum of the rest of the probabilities.
And a fourth solution is to do what you always need to remember when working with floating point numbers (or any inexact numbers, such as fractions being used to represent irrational numbers), and never compare two numbers for equality. Again in base 10, if you add up 3 copies of 1/3, you will wind up with .999. When you want to compare that number to 1, you have to instead compare to see if it is close enough to 1; check that the absolute value of the difference, 1-.999, is less than a threshold, such as .01.
Binary machines always round decimal fractions (except .0 and .5, .25, .75, etc) to values that don't have an exact representation in floating point. This has nothing to do with the language C++. There is no real way around it except to deal with it from a numerical perspective within your code.
As for actually producing the probabilities you seek:
float pr[3] = {0.1, 0.2, 0.7};
float accPr[3];
float prev = 0.0;
int i = 0;
for (i = 0; i < 3; i++) {
accPr[i] = prev + pr[i];
prev = accPr[i];
}
float frand = rand() / (1 + RAND_MAX);
for (i = 0; i < 2; i++) {
if (frand < accPr[i]) break;
}
return i;
I'm sorry to say there's not really an easy answer to your problem.
It falls into a field of study called "Numerical Analysis" that deals with these types of problems (which goes far beyond just making sure you don't check for equality between 2 floating point values). And by field of study, I mean there are a slew of books, journal articles, courses etc. dealing with it. There are people who do their PhD thesis on it.
All I can say is that that I'm thankful I don't have to deal with these issues very much, because the problems and the solutions are often very non-intuitive.
What you might need to do to deal with representing the numbers and calculations you're working on is very dependent on exactly what operations you're doing, the order of those operations and the range of values that you expect to deal with in those operations.
Depending on the requirements of your applications any one of several solutions could be best:
You live with the inherent lack of precision and use floats or doubles. You cannot test either for equality and this implies that you cannot test the sum of your probabilities for equality with 1.0.
As proposed before, you can use integers if you require a fixed precision. You represent 0.7 as 7, 0.1 as 1, 0.2 as 2 and they will add up perfectly to 10, i.e., 1.0. If you have to calculate with your probabilities, especially if you do division and multiplication, you need to round the results correctly. This will introduce an imprecision again.
Represent your numbers as fractions with a pair of integers (1,2) = 1/2 = 0.5. Precise, more flexible than 2) but you don't want to calculate with those.
You can go all the way and use a library that implements rational numbers (e.g. gmp). Precise, with arbitrary precision, you can calculate with it, but slow.
yeah, I'd scale the numbers (0-100)(0-1000) or whatever fixed size you need if you're worried about such things. It also makes for faster math computation in most cases. Back in the bad-old-days, we'd define entire cos/sine tables and other such bleh in integer form to reduce floating fuzz and increase computation speed.
I do find it a bit interesting that a "0.7" fuzzes like that on storage.

Accurate evaluation of 1/1 + 1/2 + ... 1/n row

I need to evaluate the sum of the row: 1/1+1/2+1/3+...+1/n. Considering that in C++ evaluations are not complete accurate, the order of summation plays important role. 1/n+1/(n-1)+...+1/2+1/1 expression gives the more accurate result.
So I need to find out the order of summation, which provides the maximum accuracy.
I don't even know where to begin.
Preferred language of realization is C++.
Sorry for my English, if there are any mistakes.
For large n you'd better use asymptotic formulas, like the ones on http://en.wikipedia.org/wiki/Harmonic_number;
Another way is to use exp-log transformation. Basically:
H_n = 1 + 1/2 + 1/3 + ... + 1/n = log(exp(1 + 1/2 + 1/3 + ... + 1/n)) = log(exp(1) * exp(1/2) * exp(1/3) * ... * exp(1/n)).
Exponents and logarithms can be calculated pretty quickly and accuratelly by your standard library. Using multiplication you should get much more accurate results.
If this is your homework and you are required to use simple addition, you'll better add from the smallest one to the largest one, as others suggested.
The reason for the lack of accuracy is the precision of the float, double, and long double types. They only store so many "decimal" places. So adding a very small value to a large value has no effect, the small term is "lost" in the larger one.
The series you're summing has a "long tail", in the sense that the small terms should add up to a large contribution. But if you sum in descending order, then after a while each new small term will have no effect (even before that, most of its decimal places will be discarded). Once you get to that point you can add a billion more terms, and if you do them one at a time it still has no effect.
I think that summing in ascending order should give best accuracy for this kind of series, although it's possible there are some odd corner cases where errors due to rounding to powers of (1/2) might just so happen to give a closer answer for some addition orders than others. You probably can't really predict this, though.
I don't even know where to begin.
Here: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Actually, if you're doing the summation for large N, adding in order from smallest to largest is not the best way -- you can still get into a situation where the numbers you're adding are too small relative to the sum to produce an accurate result.
Look at the problem this way: You have N summations, regardless of ordering, and you wish to have the least total error. Thus, you should be able to get the least total error by minimizing the error of each summation -- and you minimize the error in a summation by adding values as nearly close to each other as possible. I believe that following that chain of logic gives you a binary tree of partial sums:
Sum[0,i] = value[i]
Sum[1,i/2] = Sum[0,i] + Sum[0,i+1]
Sum[j+1,i/2] = Sum[j,i] + Sum[j,i+1]
and so on until you get to a single answer.
Of course, when N is not a power of two, you'll end up with leftovers at each stage, which you need to carry over into the summations at the next stage.
(The margins of StackOverflow are of course too small to include a proof that this is optimal. In part because I haven't taken the time to prove it. But it does work for any N, however large, as all of the additions are adding values of nearly identical magnitude. Well, all but log(N) of them in the worst not-power-of-2 case, and that's vanishingly small compared to N.)
http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic
You can find libraries with ready for use implementation for C/C++.
For example http://www.apfloat.org/apfloat/
Unless you use some accurate closed-form representation, a small-to-large ordered summation is likely to be most accurate simple solution (it's not clear to me why a log-exp would help - that's a neat trick, but you're not winning anything with it here, as far as I can tell).
You can further gain precision by realizing that after a while, the sum will become "quantized": Effectively, when you have 2 digits of precision, adding 1.3 to 41 results in 42, not 42.3 - but you achieve almost a precision doubling by maintaining an "error" term. This is called Kahan Summation. You'd compute the error term (42-41-1.3 == -0.3) and correct that in the next addition by adding 0.3 to the next term before you add it in again.
Kahan Summation in addition to a small-to-large ordering is liable to be as accurate as you'll ever need to get. I seriously doubt you'll ever need anything better for the harmonic series - after all, even after 2^45 iterations (crazy many) you'd still only be dealing with a numbers that are at least 1/2^45 large, and a sum that's on the order of 45 (<2^6), for an order of magnitude difference of 51 powers-of-two - i.e. even still representable in a double precision variable if you add in the "wrong" order.
If you go small-to-large, and use Kahan Summation, the sun's probably going to extinguish before today's processors reach a percent of error - and you'll run into other tricky accuracy issues just due to the individual term error on that scale first anyhow (being that a number of the order of 2^53 or larger cannot be represented accurately as a double at all anyhow.)
I'm not sure about the order of summation playing an important role, I havent heard that before. I guess you want to do this in floating point arithmetic so the first thing is to think more inline of (1.0/1.0 + 1.0/2.0+1.0/3.0) - otherwise the compiler will do integer division
to determine order of evaluation, maybe a for loop or brackets?
e.g.
float f = 0.0;
for (int i=n; i>0; --i)
{
f += 1.0/static_cast<float>(i);
}
oh forgot to say, compilers will normally have switches to determine floating point evaluation mode. this is maybe related to what you say on order of summation - in visual C+ these are found in code-generation compile settings, in g++ there're options -float that handle this
actually, the other guy is right - you should do summation in order of smallest component first; so
1/n + 1/(n-1) .. 1/1
this is because the precision of a floating point number is linked to the scale, if you start at 1 you'll have 23 bits of precision relative to 1.0. if you start at a smaller number the precision is relative to the smaller number, so you'll get 23 bits of precision relative to 1xe-200 or whatever. then as the number gets bigger rounding error will occur, but the overall error will be less than the other direction
As all your numbers are rationals, the easiest (and also maybe the fastest, as it will have to do less floating point operations) would be to do the computations with rationals (tuples of 2 integers p,q), and then do just one floating point division at the end.
update to use this technique effectively you will need to use bigints for p & q, as they grow quite fast...
A fast prototype in Lisp, that has built in rationals shows:
(defun sum_harmonic (n acc)
(if (= n 0) acc (sum_harmonic (- n 1) (+ acc (/ 1 n)))))
(sum_harmonic 10 0)
7381/2520
[2.9289682]
(sum_harmonic 100 0)
14466636279520351160221518043104131447711/278881500918849908658135235741249214272
[5.1873775]
(sum_harmonic 1000 0)
53362913282294785045591045624042980409652472280384260097101349248456268889497101
75750609790198503569140908873155046809837844217211788500946430234432656602250210
02784256328520814055449412104425101426727702947747127089179639677796104532246924
26866468888281582071984897105110796873249319155529397017508931564519976085734473
01418328401172441228064907430770373668317005580029365923508858936023528585280816
0759574737836655413175508131522517/712886527466509305316638415571427292066835886
18858930404520019911543240875811114994764441519138715869117178170195752565129802
64067621009251465871004305131072686268143200196609974862745937188343705015434452
52373974529896314567498212823695623282379401106880926231770886197954079124775455
80493264757378299233527517967352480424636380511370343312147817468508784534856780
21888075373249921995672056932029099390891687487672697950931603520000
[7.485471]
So, the next better option could be to mantain the list of floating points and to reduce it summing the two smallest numbers in each step...