round double with N significant decimal digits in an overflow-safe manner - c++

I want a overflow-safe function that round a double like std::round in addition it can handle the number of significant decimal digts.
f.e.
round(-17.747, 2) -> -17.75
round(-9.97729, 2) -> -9.98
round(-5.62448, 2) -> -5.62
round(std::numeric_limits<double>::max(), 10) ...
My first attempt was
double round(double value, int precision)
{
double factor=pow(10.0, precision);
return floor(value*factor+0.5)/factor;
}
but this can easily overflow.
Assuming IEEE, it is possible to decrease the possibility of overflows, like this.
double round(double value, int precision)
{
// assuming IEEE 754 with 64 bit representation
// the number of significant digits varies between 15 and 17
precision=std::min(17, precision);
double factor=pow(10.0, precision);
return floor(value*factor+0.5)/factor;
}
But this still can overflow.
Even this performance disaster does not work.
double round(double value, int precision)
{
std::stringstream ss;
ss << std::setprecision(precision) << value;
std::string::size_type sz;
return std::stod(ss.str(), &sz);
}
round(std::numeric_limits<double>::max(), 2.0) // throws std::out_of_range
Note:
I'm aware of setprecision, but i need rounding not only for displaying purpose. So that is not a solution.
Unlike this post here How to round a number to n decimal places in Java , my question is especially on overflow safety and in C++ (the anwser in the topic above are Java-specific or do not handle overflows)

I haven't heavily tested this code:
/* expects x in (-1, 1) */
double round_precision2(double x, int precision2) {
double iptr, factor = std::exp2(precision2);
double y = (x < 0) ? -x : x;
std::modf(y * factor + .5, &iptr);
return iptr/factor * ((x < 0) ? -1 : 1);
}
double round_precision(double x, int precision) {
int bits = precision * M_LN10 / M_LN2;
/* std::log2(std::pow(10., precision)); */
double iptr, frac = std::modf(x, &iptr);
return iptr + round_precision2(frac, bits);
}
The idea is to avoid overflow by only operating on the fractional part of the number.
We compute the number of binary bits to achieve the desired precision. You should be able to put a bound on them with the limits you describe in your question.
Next, we extract the fractional and integer parts of the number.
Then we add the integer part back to the rounded fractional part.
To compute the rounded fractional part, we compute the binary factor. Then we extract the integer part of the rounded number resulting from multiplying fractional part by the factor. Then we return the fraction by dividing the integral part by the factor.

Related

Efficient way of checking the length of a double in C++

Say I have a number, 100000, I can use some simple maths to check its size, i.e. log(100000) -> 5 (base 10 logarithm). Theres also another way of doing this, which is quite slow. std::string num = std::to_string(100000), num.size(). Is there an way to mathematically determine the length of a number? (not just 100000, but for things like 2313455, 123876132.. etc)
Why not use ceil? It rounds up to the nearest whole number - you can just wrap that around your log function, and add a check afterwards to catch the fact that a power of 10 would return 1 less than expected.
Here is a solution to the problem using single precision floating point numbers in O(1):
#include <cstdio>
#include <iostream>
#include <cstring>
int main(){
float x = 500; // to be converted
uint32_t f;
std::memcpy(&f, &x, sizeof(uint32_t)); // Convert float into a manageable int
uint8_t exp = (f & (0b11111111 << 23)) >> 23; // get the exponent
exp -= 127; // floating point bias
exp /= 3.32; // This will round but for this case it should be fine (ln2(10))
std::cout << std::to_string(exp) << std::endl;
}
For a number in scientific notation a*10^e this will return e (when 1<=a<10), so the length of the number (if it has an absolute value larger than 1), will be exp + 1.
For double precision this works, but you have to adapt it (bias is 1023 I think, and bit alignment is different. Check this)
This only works for floating point numbers, though so probably not very useful in this case. The efficiency in this case relative to the logarithm will also be determined by the speed at which int -> float conversion can occur.
Edit:
I just realised the question was about double. The modified result is:
int16_t getLength(double a){
uint64_t bits;
std::memcpy(&bits, &a, sizeof(uint64_t));
int16_t exp = (f >> 52) & 0b11111111111; // There is no 11 bit long int so this has to do
exp -= 1023;
exp /= 3.32;
return exp + 1;
}
There are some changes so that it behaves better (and also less shifting).
You can also use frexp() to get the exponent without bias.
If the number is whole, keep dividing by 10, until you're at 0. You'd have to divide 100000 6 times, for example. For the fractional part, you need to keep multiplying by 10 until trunc(f) == f.

Rounding off floating numbers in cpp

For a particular question, I need to perform calculations on a floating number, round it off to 2 digits after the decimal place, and assign it to a variable for comparison purposes. I tried to find a solution to this but all I keep finding is how to print those rounded numbers (using printf or setprecision) instead of assigning them to a variable.
Please help.
I usually do something like that:
#include <cmath> // 'std::floor'
#include <limits> // 'std::numeric_limits'
// Round value to granularity
template<typename T> inline T round(const T x, const T gran)
{
//static_assert(gran!=0);
return gran * std::floor( x/gran + std::numeric_limits<T>::round_error() );
}
double rounded_to_cent = round(1.23456, 0.01); // Gives something near 1.23
Be sure you know how floating point types work though.
Addendum: I know that this topic has already been extensively covered in other questions, but let me put this small paragraph here.
Given a real number, you can represent it with -almost- arbitrary accuracy with a (base10) literal like 1.2345, that's a string that you can type with your keyboard.
When you store that value in a floating point type, let's say a double, you -almost- always loose accuracy because probably your number won't have an exact representation in the finite set of the numbers representable by that type.
Nowadays double uses 64 bits, so it has 2^64 symbols to represent the not numerable infinity of real numbers: that's a H2O molecule in an infinity of infinite oceans.
The representation error is relative to the value; for example in a IEEE 754 double, over 2^53 not all the integer values can be represented.
So when someone tells that the "result is wrong" they're technically right; the "acceptable" result is application dependent.
round it off to 2 digits after the decimal place, and assign it to a variable for comparison purposes
To avoid errors that creep in when using binary floating point in a decimal problem, consider alternatives.
Direct approach has corner errors due to double rounding and overflow. These errors may be tolerable for OP larger goals
// Errors:
// 1) x*100.0, round(x*100.0)/100.0 inexact.
// Select `x` values near "dddd.dd5" form an inexact product `x*100.0`
// and may lead to a double rounding error and then incorrect result when comparing.
// 2) x*100.0 may overflow.
int compare_hundredth1(double x, double ref) {
x = round(x*100.0)/100.0;
return (x > ref) - (x < ref);
}
We can do better.
When a wider floating point type exist:
int compare_hundredth2(double x, double ref) {
auto x_rounded = math::round(x*100.0L);
auto ref_rounded = ref*100.0L;
return (x_rounded > ref_rounded) - (x_rounded < ref_rounded);
}
To use the same width floating point type takes more work:
All finite large larges of x, ref are whole numbers and need no rounding to the nearest 0.01.
int compare_hundredth3(double x, double ref) {
double x_whole;
auto x_fraction = modf(x, &x_whole);
// If rounding needed ...
if (x_fraction != 0.0) {
if (x - 0.01 > ref) return 1; // x much more than ref
if (x + 0.01 < ref) return -1; // x much less than ref
// x, ref nearly the same
double ref_whole;
auto ref_fraction = modf(x, &ref_whole);
x -= ref_whole;
auto x100 = (x - ref_whole)*100; // subtraction expected to be exact here.
auto ref100 = ref_fraction*100;
return (x100 > ref100) - (x100 < ref100);
}
return (x > ref) - (x < ref);
}
The above assume ref is without error. If this is not so, consider using a scaled ref.
Note: The above sets aside not-a-number concerns.
More clean-up later.
Here's an example with a custom function that rounds up the floating number f to n decimal places. Basically, it multiplies the floating number by 10 to the power of N to separate the decimal places, then uses roundf to round the decimal places up or down, and finally divides back the floating number by 10 to the power of N (N is the amount of decimal places). Works for C and C++:
#include <stdio.h>
#include <math.h>
float my_round(float f, unsigned int n)
{
float p = powf(10.0f, (float)n);
f *= p;
f = roundf(f);
f /= p;
return f;
}
int main()
{
float f = 0.78901f;
printf("%f\n", f);
f = my_round(f, 2); /* Round with 2 decimal places */
printf("%f\n", f);
return 0;
}
Output:
0.789010
0.790000

How to round a floating point type to two decimals or more in C++? [duplicate]

How can I round a float value (such as 37.777779) to two decimal places (37.78) in C?
If you just want to round the number for output purposes, then the "%.2f" format string is indeed the correct answer. However, if you actually want to round the floating point value for further computation, something like the following works:
#include <math.h>
float val = 37.777779;
float rounded_down = floorf(val * 100) / 100; /* Result: 37.77 */
float nearest = roundf(val * 100) / 100; /* Result: 37.78 */
float rounded_up = ceilf(val * 100) / 100; /* Result: 37.78 */
Notice that there are three different rounding rules you might want to choose: round down (ie, truncate after two decimal places), rounded to nearest, and round up. Usually, you want round to nearest.
As several others have pointed out, due to the quirks of floating point representation, these rounded values may not be exactly the "obvious" decimal values, but they will be very very close.
For much (much!) more information on rounding, and especially on tie-breaking rules for rounding to nearest, see the Wikipedia article on Rounding.
Using %.2f in printf. It only print 2 decimal points.
Example:
printf("%.2f", 37.777779);
Output:
37.77
Assuming you're talking about round the value for printing, then Andrew Coleson and AraK's answer are correct:
printf("%.2f", 37.777779);
But note that if you're aiming to round the number to exactly 37.78 for internal use (eg to compare against another value), then this isn't a good idea, due to the way floating point numbers work: you usually don't want to do equality comparisons for floating point, instead use a target value +/- a sigma value. Or encode the number as a string with a known precision, and compare that.
See the link in Greg Hewgill's answer to a related question, which also covers why you shouldn't use floating point for financial calculations.
How about this:
float value = 37.777779;
float rounded = ((int)(value * 100 + .5) / 100.0);
printf("%.2f", 37.777779);
If you want to write to C-string:
char number[24]; // dummy size, you should take care of the size!
sprintf(number, "%.2f", 37.777779);
Always use the printf family of functions for this. Even if you want to get the value as a float, you're best off using snprintf to get the rounded value as a string and then parsing it back with atof:
#include <math.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
double dround(double val, int dp) {
int charsNeeded = 1 + snprintf(NULL, 0, "%.*f", dp, val);
char *buffer = malloc(charsNeeded);
snprintf(buffer, charsNeeded, "%.*f", dp, val);
double result = atof(buffer);
free(buffer);
return result;
}
I say this because the approach shown by the currently top-voted answer and several others here -
multiplying by 100, rounding to the nearest integer, and then dividing by 100 again - is flawed in two ways:
For some values, it will round in the wrong direction because the multiplication by 100 changes the decimal digit determining the rounding direction from a 4 to a 5 or vice versa, due to the imprecision of floating point numbers
For some values, multiplying and then dividing by 100 doesn't round-trip, meaning that even if no rounding takes place the end result will be wrong
To illustrate the first kind of error - the rounding direction sometimes being wrong - try running this program:
int main(void) {
// This number is EXACTLY representable as a double
double x = 0.01499999999999999944488848768742172978818416595458984375;
printf("x: %.50f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.50f\n", res1);
printf("Rounded with round, then divided: %.50f\n", res2);
}
You'll see this output:
x: 0.01499999999999999944488848768742172978818416595459
Rounded with snprintf: 0.01000000000000000020816681711721685132943093776703
Rounded with round, then divided: 0.02000000000000000041633363423443370265886187553406
Note that the value we started with was less than 0.015, and so the mathematically correct answer when rounding it to 2 decimal places is 0.01. Of course, 0.01 is not exactly representable as a double, but we expect our result to be the double nearest to 0.01. Using snprintf gives us that result, but using round(100 * x) / 100 gives us 0.02, which is wrong. Why? Because 100 * x gives us exactly 1.5 as the result. Multiplying by 100 thus changes the correct direction to round in.
To illustrate the second kind of error - the result sometimes being wrong due to * 100 and / 100 not truly being inverses of each other - we can do a similar exercise with a very big number:
int main(void) {
double x = 8631192423766613.0;
printf("x: %.1f\n", x);
double res1 = dround(x, 2);
double res2 = round(100 * x) / 100;
printf("Rounded with snprintf: %.1f\n", res1);
printf("Rounded with round, then divided: %.1f\n", res2);
}
Our number now doesn't even have a fractional part; it's an integer value, just stored with type double. So the result after rounding it should be the same number we started with, right?
If you run the program above, you'll see:
x: 8631192423766613.0
Rounded with snprintf: 8631192423766613.0
Rounded with round, then divided: 8631192423766612.0
Oops. Our snprintf method returns the right result again, but the multiply-then-round-then-divide approach fails. That's because the mathematically correct value of 8631192423766613.0 * 100, 863119242376661300.0, is not exactly representable as a double; the closest value is 863119242376661248.0. When you divide that back by 100, you get 8631192423766612.0 - a different number to the one you started with.
Hopefully that's a sufficient demonstration that using roundf for rounding to a number of decimal places is broken, and that you should use snprintf instead. If that feels like a horrible hack to you, perhaps you'll be reassured by the knowledge that it's basically what CPython does.
Also, if you're using C++, you can just create a function like this:
string prd(const double x, const int decDigits) {
stringstream ss;
ss << fixed;
ss.precision(decDigits); // set # places after decimal
ss << x;
return ss.str();
}
You can then output any double myDouble with n places after the decimal point with code such as this:
std::cout << prd(myDouble,n);
There isn't a way to round a float to another float because the rounded float may not be representable (a limitation of floating-point numbers). For instance, say you round 37.777779 to 37.78, but the nearest representable number is 37.781.
However, you can "round" a float by using a format string function.
You can still use:
float ceilf(float x); // don't forget #include <math.h> and link with -lm.
example:
float valueToRound = 37.777779;
float roundedValue = ceilf(valueToRound * 100) / 100;
In C++ (or in C with C-style casts), you could create the function:
/* Function to control # of decimal places to be output for x */
double showDecimals(const double& x, const int& numDecimals) {
int y=x;
double z=x-y;
double m=pow(10,numDecimals);
double q=z*m;
double r=round(q);
return static_cast<double>(y)+(1.0/m)*r;
}
Then std::cout << showDecimals(37.777779,2); would produce: 37.78.
Obviously you don't really need to create all 5 variables in that function, but I leave them there so you can see the logic. There are probably simpler solutions, but this works well for me--especially since it allows me to adjust the number of digits after the decimal place as I need.
Use float roundf(float x).
"The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.5
#include <math.h>
float y = roundf(x * 100.0f) / 100.0f;
Depending on your float implementation, numbers that may appear to be half-way are not. as floating-point is typically base-2 oriented. Further, precisely rounding to the nearest 0.01 on all "half-way" cases is most challenging.
void r100(const char *s) {
float x, y;
sscanf(s, "%f", &x);
y = round(x*100.0)/100.0;
printf("%6s %.12e %.12e\n", s, x, y);
}
int main(void) {
r100("1.115");
r100("1.125");
r100("1.135");
return 0;
}
1.115 1.115000009537e+00 1.120000004768e+00
1.125 1.125000000000e+00 1.129999995232e+00
1.135 1.134999990463e+00 1.139999985695e+00
Although "1.115" is "half-way" between 1.11 and 1.12, when converted to float, the value is 1.115000009537... and is no longer "half-way", but closer to 1.12 and rounds to the closest float of 1.120000004768...
"1.125" is "half-way" between 1.12 and 1.13, when converted to float, the value is exactly 1.125 and is "half-way". It rounds toward 1.13 due to ties to even rule and rounds to the closest float of 1.129999995232...
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 and rounds to the closest float of 1.129999995232...
If code used
y = roundf(x*100.0f)/100.0f;
Although "1.135" is "half-way" between 1.13 and 1.14, when converted to float, the value is 1.134999990463... and is no longer "half-way", but closer to 1.13 but incorrectly rounds to float of 1.139999985695... due to the more limited precision of float vs. double. This incorrect value may be viewed as correct, depending on coding goals.
Code definition :
#define roundz(x,d) ((floor(((x)*pow(10,d))+.5))/pow(10,d))
Results :
a = 8.000000
sqrt(a) = r = 2.828427
roundz(r,2) = 2.830000
roundz(r,3) = 2.828000
roundz(r,5) = 2.828430
double f_round(double dval, int n)
{
char l_fmtp[32], l_buf[64];
char *p_str;
sprintf (l_fmtp, "%%.%df", n);
if (dval>=0)
sprintf (l_buf, l_fmtp, dval);
else
sprintf (l_buf, l_fmtp, dval);
return ((double)strtod(l_buf, &p_str));
}
Here n is the number of decimals
example:
double d = 100.23456;
printf("%f", f_round(d, 4));// result: 100.2346
printf("%f", f_round(d, 2));// result: 100.23
I made this macro for rounding float numbers.
Add it in your header / being of file
#define ROUNDF(f, c) (((float)((int)((f) * (c))) / (c)))
Here is an example:
float x = ROUNDF(3.141592, 100)
x equals 3.14 :)
Let me first attempt to justify my reason for adding yet another answer to this question. In an ideal world, rounding is not really a big deal. However, in real systems, you may need to contend with several issues that can result in rounding that may not be what you expect. For example, you may be performing financial calculations where final results are rounded and displayed to users as 2 decimal places; these same values are stored with fixed precision in a database that may include more than 2 decimal places (for various reasons; there is no optimal number of places to keep...depends on specific situations each system must support, e.g. tiny items whose prices are fractions of a penny per unit); and, floating point computations performed on values where the results are plus/minus epsilon. I have been confronting these issues and evolving my own strategy over the years. I won't claim that I have faced every scenario or have the best answer, but below is an example of my approach so far that overcomes these issues:
Suppose 6 decimal places is regarded as sufficient precision for calculations on floats/doubles (an arbitrary decision for the specific application), using the following rounding function/method:
double Round(double x, int p)
{
if (x != 0.0) {
return ((floor((fabs(x)*pow(double(10.0),p))+0.5))/pow(double(10.0),p))*(x/fabs(x));
} else {
return 0.0;
}
}
Rounding to 2 decimal places for presentation of a result can be performed as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,8),6),2));
For val = 6.825, result is 6.83 as expected.
For val = 6.824999, result is 6.82. Here the assumption is that the calculation resulted in exactly 6.824999 and the 7th decimal place is zero.
For val = 6.8249999, result is 6.83. The 7th decimal place being 9 in this case causes the Round(val,6) function to give the expected result. For this case, there could be any number of trailing 9s.
For val = 6.824999499999, result is 6.83. Rounding to the 8th decimal place as a first step, i.e. Round(val,8), takes care of the one nasty case whereby a calculated floating point result calculates to 6.8249995, but is internally represented as 6.824999499999....
Finally, the example from the question...val = 37.777779 results in 37.78.
This approach could be further generalized as:
double val;
// ...perform calculations on val
String(Round(Round(Round(val,N+2),N),2));
where N is precision to be maintained for all intermediate calculations on floats/doubles. This works on negative values as well. I do not know if this approach is mathematically correct for all possibilities.
...or you can do it the old-fashioned way without any libraries:
float a = 37.777779;
int b = a; // b = 37
float c = a - b; // c = 0.777779
c *= 100; // c = 77.777863
int d = c; // d = 77;
a = b + d / (float)100; // a = 37.770000;
That of course if you want to remove the extra information from the number.
this function takes the number and precision and returns the rounded off number
float roundoff(float num,int precision)
{
int temp=(int )(num*pow(10,precision));
int num1=num*pow(10,precision+1);
temp*=10;
temp+=5;
if(num1>=temp)
num1+=10;
num1/=10;
num1*=10;
num=num1/pow(10,precision+1);
return num;
}
it converts the floating point number into int by left shifting the point and checking for the greater than five condition.

Round floating point number to smaller precision [duplicate]

This question already has answers here:
Efficient way to round double precision numbers to a lower precision given in number of bits
(2 answers)
Fast float quantize, scaled by precision?
(1 answer)
Closed 2 years ago.
Is there a reasonably portable way to round a floating point number to a smaller precision, e.g. reduce the precision by 3 binary digits?
For example, an IEEE double has 53 bits of precision. I would like to round (or perhaps truncate) it to 50 bits of precision.
Is this possible with standard library facilities, without having to assume too much about the floating point representation (e.g. having to know the layout of bits)? It is unclear to me if the standard library has sufficient information about the representation (FLT_RADIX, DBL_MANT_DIG, etc.) to make this feasible.
to round a floating point number to a smaller precision
Is this possible with standard library facilities, without having to assume too much about the floating point representation
If we assume the common FLT_RADIX == 2, easy enough to use frexp() to extract the significand, round (or truncate), and then reform.
#include <math.h>
#include <stdio.h>
double round3(double value) {
// Get significand
int exp;
printf("%-22a %.20f\n", value, value);
double normalized_fraction = frexp(value, &exp); // [0.5 ... 1.0) or zero,
printf("%-22a %.20f\n", normalized_fraction, normalized_fraction);
// Scale to a whole number range
double normalized_integer = normalized_fraction * pow(FLT_RADIX, DBL_MANT_DIG);
printf("%-22a %.20f %llX\n", normalized_integer, normalized_integer,
(unsigned long long) normalized_integer);
// .. but we really want it scaled by 3 bits less
double normalized_integer_3 = normalized_fraction * pow(FLT_RADIX, DBL_MANT_DIG - 3);
printf("%-22a %.20f\n", normalized_integer_3, normalized_integer_3);
// round
double rounded_normalized_integer_3 = round(normalized_integer_3);
printf("%-22a %.20f\n", rounded_normalized_integer_3, rounded_normalized_integer_3);
// reform
double y = ldexp(rounded_normalized_integer_3, exp - (DBL_MANT_DIG - 3));
printf("%-22a %.20f\n", y, y);
puts("");
return y;
}
Simplified
double round3(double value) {
assert(FLT_RADIX == 2);
int exp;
double normalized_fraction = frexp(value, &exp); // [0.5 ... 1.0) or zero,
double rounded_norm_integer_3 = round(normalized_fraction * pow(2, DBL_MANT_DIG-3));
return ldexp(rounded_norm_integer_3, exp - (DBL_MANT_DIG - 3));
}
Test
int main() {
round3(1.0/3.0);
}
Output
// v --- last hex digit in binary 0101
0x1.5555555555555p-2 0.33333333333333331483
0x1.5555555555555p-1 0.66666666666666662966
0x1.5555555555555p+52 6004799503160661.00000000000000000000 15555555555555
0x1.5555555555555p+49 750599937895082.62500000000000000000
0x1.5555555555558p+49 750599937895083.00000000000000000000
0x1.5555555555558p-2 0.33333333333333348136
// ^ --- last hex digit in binary 1000 (rounded)
To accommodate rare cases when FLT_RADIX != 2, other considerations are needed to maintain accuracy as frexp() and ldexp() work with powers-of-2.
For C++ users, sorry the output uses printf(), but that is only illustrative and can be removed.

Multiplication between big integers and doubles

I am managing some big (128~256bits) integers with gmp. It has come a point were I would like to multiply them for a double close to 1 (0.1 < double < 10), the result being still an approximated integer. A good example of the operation I need to do is the following:
int i = 1000000000000000000 * 1.23456789
I searched in the gmp documentation but I didn't find a function for this, so I ended up writing this code which seems to work well:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d, int prec=10) {
if (prec > 15) prec=15; //avoids overflows
uint_fast64_t m = (uint_fast64_t) floor(d);
r = i * m;
uint_fast64_t pos=1;
for (uint_fast8_t j=0; j<prec; j++) {
const double posd = (double) pos;
m = ((uint_fast64_t) floor(d * posd * 10.)) -
((uint_fast64_t) floor(d * posd)) * 10;
pos*=10;
r += (i * m) /pos;
}
}
Can you please tell me what do you think? Do you have any suggestion to make it more robust or faster?
this is what you wanted:
// BYTE lint[_N] ... lint[0]=MSB, lint[_N-1]=LSB
void mul(BYTE *c,BYTE *a,double b) // c[_N]=a[_N]*b
{
int i; DWORD cc;
double q[_N+1],aa,bb;
for (q[0]=0.0,i=0;i<_N;) // mul,carry down
{
bb=double(a[i])*b; aa=floor(bb); bb-=aa;
q[i]+=aa; i++;
q[i]=bb*256.0;
}
cc=0; if (q[_N]>127.0) cc=1.0; // round
for (i=_N-1;i>=0;i--) // carry up
{
double aa,bb;
cc+=q[i];
c[i]=cc&255;
cc>>=8;
}
}
_N is number of bits/8 per large int, large int is array of _N BYTEs where first byte is MSB (most significant BYTE) and last BYTE is LSB (least significant BYTE)
function is not handling signum, but it is only one if and some xor/inc to add.
trouble is that double has low precision even for your number 1.23456789 !!! due to precision loss the result is not exact what it should be (1234387129122386944 instead of 1234567890000000000) I think my code is mutch quicker and even more precise than yours because i do not need to mul/mod/div numbers by 10, instead i use bit shifting where is possible and not by 10-digit but by 256-digit (8bit). if you need more precision than use long arithmetic. you can speed up this code by using larger digits (16,32, ... bit)
My long arithmetics for precise astro computations are usually fixed point 256.256 bits numbers consist of 2*8 DWORDs + signum, but of course is much slower and some goniometric functions are realy tricky to implement, but if you want just basic functions than code your own lon arithmetics is not that hard.
also if you want to have numbers often in readable form is good to compromise between speed/size and consider not to use binary coded numbers but BCD coded numbers
I am not so familiar with either C++ or GMP what I could suggest source code without syntax errors, but what you are doing is more complicated than it should and can introduce unnecessary approximation.
Instead, I suggest you write function mpz_mult_d() like this:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d) {
d = ldexp(d, 52); /* exact, no overflow because 1 <= d <= 10 */
unsigned long long l = d; /* exact because d is an integer */
p = l * i; /* exact, in GMP */
(quotient, remainder) = p / 2^52; /* in GMP */
And now the next step depends on the kind of rounding you wish. If you wish the multiplication of d by i to give a result rounded toward -inf, just return quotient as result of the function. If you wish a result rounded to the nearest integer, you must look at remainder:
assert(0 <= remainder); /* proper Euclidean division */
assert(remainder < 2^52);
if (remainder < 2^51) return quotient;
if (remainder > 2^51) return quotient + 1; /* in GMP */
if (remainder == 2^51) return quotient + (quotient & 1); /* in GMP, round to “even” */
PS: I found your question by random browsing but if you had tagged it “floating-point”, people more competent than me could have answered it quickly.
Try this strategy:
Convert integer value to big float
Convert double value to big float
Make product
Convert result to integer
mpf_set_z(...)
mpf_set_d(...)
mpf_mul(...)
mpz_set_f(...)