It is hard to explain the question, i would like to convert a double number to integer without rounding the value after the decimal point.
For example
double a = 123.456
I want to convert become
int b = 123456
I want to know how many digit there is, and move it back after calculated to become 123.456
PS:I just want pure mathematical method to solve this issue, without calculating the character of it.
Sorry, there's no solution to your problem because the number 123.456 does not exist as a double. It's rounded to 123.4560000000000030695446184836328029632568359375, and this number obviously does not fit into any integer type after you remove the decimal point.
If you want 123.456 to be treated as the exact number 123.456, then the only remotely simple way to do this is to convert it to a string and remove the decimal point from the string. This can be achieved with something like
snprintf(buf, sizeof buf, "%.13f", 123.456);
Actually figuring out the number of places you want to print it to, however, is rather difficult. If you use too many, you'll end up picking up part of the exact value I showed above. If you use too few, then obviously you'll drop places you wanted to keep.
try this :
double a = 123.456;
int i;
char str[20];
char str2[20];
sptrintf(str,"%d",a);
for(i=0;i<strlen(str);i++)
{
if(!str[i] == '.')
{
sptrintf(str2,%c,str[i]);
}
}
int b = atoi(str2);
I believe the canonical way to do this would be
#include <math.h>
#include <stdio.h>
int main()
{
double d = 123.456;
double int_part;
double fract_part;
fract_part = modf(d, &int_part);
int i = (int)int_part*1000 + (int)(fract_part*1000);
printf("%d", i);
}
where the literal 1000 is a constant determining the number of desired decimals.
If you have the text "123.456" you can simply remove the decimal point and convert the resulting text representation to an integer value. If you have already converted the text to a floating-point value (double a = 123.456;) then all bets are off: the floating-point value does not have a pre-set number of decimal digits, because it is represented as a binary fraction. It's sort of like 1/3 versus .3333 in ordinary usage: they do not have the same value, even though we usually pretend that .3333 means 1/3.
Multiply each time original value with 10^i, increasing each time i until abs(value' - abs(value')) < epsilon for a very small epsilon. value' should be computed from the original each time, e.g.
value' = value * pow(10, i)
if ( abs(value' - abs(value')) < epsilon ) then stop
Originally I suggested that you should simply multiply by ten, but as R.. suggested, each time the numerical error gets accumulated. As result you might get a result of e.g. 123.456999 for an epsilon = .0000001 instead of 123.456000 due to floating point math.
Please note that you might exceed int type boundaries this way and might want to handle infinity values as well.
As Ignacio Vazquez-Abrams noted this might lead to problems with scenarios where you want to convert 123.500 to 123500. You might solve it by adding a very small value first (and it should be smaller than epsilon). Adding such a value could lead to a numeric error though.
Related
I have a class that internally represents some quantity in fixed point as 32-bit integer with somewhat arbitrary denominator (it is neither power of 2 nor power of 10).
For communicating with other applications the quantity is converted to plain old double on output and back on input. As code inside the class it looks like:
int32_t quantity;
double GetValue() { return double(quantity) / DENOMINATOR; }
void SetValue(double x) { quantity = x * DENOMINATOR; }
Now I need to ensure that if I output some value as double and read it back, I will always get the same value back. I.e. that
x.SetValue(x.GetValue());
will never change x.quantity (x is arbitrary instance of the class containing the above code).
The double representation has more digits of precision, so it should be possible. But it will almost certainly not be the case with the simplistic code above.
What rounding do I need to use and
How can I find the critical would-be corner cases to test that the rounding is indeed correct?
Any 32 bits will be represented exactly when you convert to a double, but when you divide then multiply by an arbitrary value you will get a similar value but not exactly the same. You should lose at most one bit per operations, which means your double will be almost the same, prior to casting back to an int.
However, since int casts are truncations, you will get the wrong result when very minor errors turn 2.000 into 1.999, thus what you need to do is a simple rounding task prior to casting back.
You can use std::lround() for this if you have C++11, else you can write you own rounding function.
You probably don't care about fairness much here, so the common int(doubleVal+0.5) will work for positives. If as seems likely, you have negatives, try this:
int round(double d) { return d<0?d-0.5:d+0.5; }
The problem you describe is the same problem which exists with converting between binary and decimal representation just with different bases. At least it exists if you want to have the double representation to be a good approximation of the original value (otherwise you could just multiply the 32 bit value you have with your fixed denominator and store the result in a double).
Assuming you want the double representation be a good approximation of your actual value the conversions are nontrivial! The conversion from your internal representation to double can be done using Dragon4 ("How to print floating point numbers accurately", Steele & White) or Grisu ("How to print floating point numbers quickly and accurately", Loitsch; I'm not sure if this algorithm is independent from the base, though). The reverse can be done using Bellerophon ("How to read floating point numbers accurately", Clinger). These algorithms aren't entirely trivial, though...
I have a program in C++ where I divide two numbers, and I need to know if the answer is an integer or not. What I am using is:
if(fmod(answer,1) == 0)
I also tried this:
if(floor(answer)==answer)
The problem is that answer usually is a 5 digit number, but with many decimals. For example, answer can be: 58696.000000000000000025658 and the program considers that an integer.
Is there any way I can make this work?
I am dividing double a/double b= double answer
(sometimes there are more than 30 decimals)
Thanks!
EDIT:
a and b are numbers in the thousands (about 100,000) which are then raised to powers of 2 and 3, added together and divided (according to a complicated formula). So I am plugging in various a and b values and looking at the answer. I will only keep the a and b values that make the answer an integer. An example of what I got for one of the answers was: 218624 which my program above considered to be an integer, but it really was: 218624.00000000000000000056982 So I need a code that can distinguish integers with more than 20-30 decimals.
You can use std::modf in cmath.h:
double integral;
if(std::modf(answer, &integral) == 0.0)
The integral part of answer is stored in fraction and the return value of std::modf is the fractional part of answer with the same sign as answer.
The usual solution is to check if the number is within a very short distance of an integer, like this:
bool isInteger(double a){
double b=round(a),epsilon=1e-9; //some small range of error
return (a<=b+epsilon && a>=b-epsilon);
}
This is needed because floating point numbers have limited precision, and numbers that indeed are integers may not be represented perfectly. For example, the following would fail if we do a direct comparison:
double d=sqrt(2); //square root of 2
double answer=2.0/(d*d); //2 divided by 2
Here, answer actually holds the value 0.99999..., so we cannot compare that to an integer, and we cannot check if the fractional part is close to 0.
In general, since the floating point representation of a number can be either a bit smaller or a bit bigger than the actual number, it is not good to check if the fractional part is close to 0. It may be a number like 0.99999999 or 0.000001 (or even their negatives), these are all possible results of a precision loss. That's also why I'm checking both sides (+epsilon and -epsilon). You should adjust that epsilon variable to fit your needs.
Also, keep in mind that the precision of a double is close to 15 digits. You may also use a long double, which may give you some extra digits of precision (or not, it is up to the compiler), but even that only gets you around 18 digits. If you need more precision than that, you will need to use an external library, like GMP.
Floating point numbers are stored in memory using a very different bit format than integers. Because of this, comparing them for equality is not likely to work effectively. Instead, you need to test if the difference is smaller than some epsilon:
const double EPSILON = 0.00000000000000000001; // adjust for whatever precision is useful for you
double remainder = std::fmod(numer, denom);
if(std::fabs(0.0 - remainder) < EPSILON)
{
//...
}
Alternatively, if you want to include values that are close to integers (based on your desired precision), you can modify the if condition slightly (since the remainder returned by std::fmod will be in the range [0, 1)):
if (std::fabs(std::round(d) - d) < EPSILON)
{
// ...
}
You can see the test for this here.
Floating point numbers are generally somewhat precise to about 12-15 digits (as a double), but as they are stored as a mantissa (fraction) and a exponent, rational numbers (integers or common fractions) are not likely to be stored as such. For example,
double d = 2.0; // d might actually be 1.99999999999999995
Because of this, you need to compare the difference of what you expect to some very small number that encompasses the precision you desire (we will call this value, epsilon):
double d = 2.0;
bool test = std::fabs(2 - d) < epsilon; // will return true
So when you are trying to compare the remainder from std::fmod, you need to check it against the difference from 0.0 (not for actual equality to 0.0), which is what is done above.
Also, the std::fabs call prevents you from having to do 2 checks by asserting that the value will always be positive.
If you desire a precision that is greater than 15-18 decimal places, you cannot use double or long double; you will need to use a high precision floating point library.
The printf function's %g is able to show the whole number 3 if the float is 3.00, and will show 3.01 if the float's value isn't a round number float.
How would you do this yourself through some code, without formatting the number as a string?
There isn't really a simple answer
Integral values do have exact representations in the float and double formats. So, if it's really already integral, you can use:
f == floor(f)
However, if your value is the result of a calculation which at one point involved any sort of non-zero fractional part, then you will need to be concerned that you may have something very close to an integer but which isn't really, exactly, to-the-last-bit the same. You probably want to consider that to be integral.
One way this might be done:
fabs(f - round(f)) < 0.000001
And while we are on the subject, for the purists, we should note that int i = f; or double i = f; will round according to the FPU mode whereas round(3) will round half-way cases away from zero.
I have a function getSlope which takes as parameters 4 doubles and returns another double calculated using this given parameters in the following way:
double QSweep::getSlope(double a, double b, double c, double d){
double slope;
slope=(d-b)/(c-a);
return slope;
}
The problem is that when calling this function with arguments for example:
getSlope(2.71156, -1.64161, 2.70413, -1.72219);
the returned result is:
10.8557
and this is not a good result for my computations.
I have calculated the slope using Mathematica and the result for the slope for the same parameters is:
10.8452
or with more digits for precision:
10.845222072678331.
The result returned by my program is not good in my further computations.
Moreover, I do not understant how does the program returns 10.8557 starting from 10.845222072678331 (supposing that this is the approximate result for the division)?
How can I get the good result for my division?
thank you in advance,
madalina
I print the result using the command line:
std::cout<<slope<<endl;
It may be that my parameters are maybe not good, as I read them from another program (which computes a graph; after I read this parameters fromt his graph I have just displayed them to see their value but maybe the displayed vectors have not the same internal precision for the calculated value..I do not know it is really strange. Some numerical errors appears..)
When the graph from which I am reading my parameters is computed, some numerical libraries written in C++ (with templates) are used. No OpenGL is used for this computation.
thank you,
madalina
I've tried with float instead of double and I get 10.845110 as a result. It still looks better than madalina result.
EDIT:
I think I know why you get this results. If you get a, b, c and d parameters from somewhere else and you print it, it gives you rounded values. Then if you put it to Mathemtacia (or calc ;) ) it will give you different result.
I tried changing a little bit one of your parameters. When I did:
double c = 2.7041304;
I get 10.845806. I only add 0.0000004 to c!
So I think your "errors" aren't errors. Print a, b, c and d with better precision and then put them to Mathematica.
The following code:
#include <iostream>
using namespace std;
double getSlope(double a, double b, double c, double d){
double slope;
slope=(d-b)/(c-a);
return slope;
}
int main( ) {
double s = getSlope(2.71156, -1.64161, 2.70413, -1.72219);
cout << s << endl;
}
gives a result of 10.8452 with g++. How are you printing out the result in your code?
Could it be that you use DirectX or OpenGL in your project? If so they can turn off double precision and you will get strange results.
You can check your precision settings with
std::sqrt(x) * std::sqrt(x)
The result has to be pretty close to x.
I met this problem long time ago and spend a month checking all the formulas. But then I've found
D3DCREATE_FPU_PRESERVE
The problem here is that (c-a) is small, so the rounding errors inherent in floating point operations is magnified in this example. A general solution is to rework your equation so that you're not dividing by a small number, I'm not sure how you would do it here though.
EDIT:
Neil is right in his comment to this question, I computed the answer in VB using Doubles and got the same answer as mathematica.
The results you are getting are consistent with 32bit arithmetic. Without knowing more about your environment, it's not possible to advise what to do.
Assuming the code shown is what's running, ie you're not converting anything to strings or floats, then there isn't a fix within C++. It's outside of the code you've shown, and depends on the environment.
As Patrick McDonald and Treb brought both up the accuracy of your inputs and the error on a-c, I thought I'd take a look at that. One technique to look at rounding errors is interval arithmetic, which makes the upper and lower bounds which value represents explicit (they are implicit in floating point numbers, and are fixed to the precision of the representation). By treating each value as an upper and lower bound, and by extending the bounds by the error in the representation ( approx x * 2 ^ -53 for a double value x ), you get a result which gives the lower and upper bounds on the accuracy of a value, taking into account worst case precision errors.
For example, if you have a value in the range [1.0, 2.0] and subtract from it a value in the range [0.0, 1.0], then the result must lie in the range [below(0.0),above(2.0)] as the minimum result is 1.0-1.0 and the maximum is 2.0-0.0. below and above are equivalent to floor and ceiling, but for the next representable value rather than for integers.
Using intervals which represent worst-case double rounding:
getSlope(
a = [2.7115599999999995262:2.7115600000000004144],
b = [-1.6416099999999997916:-1.6416100000000002357],
c = [2.7041299999999997006:2.7041300000000005888],
d = [-1.7221899999999998876:-1.7221900000000003317])
(d-b) = [-0.080580000000000526206:-0.080579999999999665783]
(c-a) = [-0.0074300000000007129439:-0.0074299999999989383218]
to double precision [10.845222072677243474:10.845222072679954195]
So although c-a is small compared to c or a, it is still large compared to double rounding, so if you were using the worst imaginable double precision rounding, then you could trust that value's to be precise to 12 figures - 10.8452220727. You've lost a few figures off double precision, but you're still working to more than your input's significance.
But if the inputs were only accurate to the number significant figures, then rather than being the double value 2.71156 +/- eps, then the input range would be [2.711555,2.711565], so you get the result:
getSlope(
a = [2.711555:2.711565],
b = [-1.641615:-1.641605],
c = [2.704125:2.704135],
d = [-1.722195:-1.722185])
(d-b) = [-0.08059:-0.08057]
(c-a) = [-0.00744:-0.00742]
to specified accuracy [10.82930108:10.86118598]
which is a much wider range.
But you would have to go out of your way to track the accuracy in the calculations, and the rounding errors inherent in floating point are not significant in this example - it's precise to 12 figures with the worst case double precision rounding.
On the other hand, if your inputs are only known to 6 figures, it doesn't actually matter whether you get 10.8557 or 10.8452. Both are within [10.82930108:10.86118598].
Better Print out the arguments, too. When you are, as I guess, transferring parameters in decimal notation, you will lose precision for each and every one of them. The problem being that 1/5 is an infinite series in binary, so e.g. 0.2 becomes .001001001.... Also, decimals are chopped when converting an binary float to a textual representation in decimal.
Next to that, sometimes the compiler chooses speed over precision. This should be a documented compiler switch.
Patrick seems to be right about (c-a) being the main cause:
d-b = -1,72219 - (-1,64161) = -0,08058
c-a = 2,70413 - 2,71156 = -0,00743
S = (d-b)/(c-a)= -0,08058 / -0,00743 = 10,845222
You start out with six digits precision, through the subtraction you get a reduction to 3 and four digits. My best guess is that you loose additonal precision because the number -0,00743 can not be represented exaclty in a double. Try using intermediate variables with a bigger precision, like this:
double QSweep::getSlope(double a, double b, double c, double d)
{
double slope;
long double temp1, temp2;
temp1 = (d-b);
temp2 = (c-a);
slope = temp1/temp2;
return slope;
}
While the academic discussion going on is great for learning about the limitations of programming languages, you may find the simplest solution to the problem is an data structure for arbitrary precision arithmetic.
This will have some overhead, but you should be able to find something with fairly guaranteeable accuracy.
I was wondering whether it is possible to limit the number of characters we enter in a float.
I couldn't seem to find any method. I have to read in data from an external interface which sends float data of the form xx.xx. As of now I am using conversion to char and vice-versa, which is a messy work-around. Can someone suggest inputs to improve the solution?
If you always have/want only 2 decimal places for your numbers, and absolute size is not such a big issue, why not work internally with integers instead, but having their meaning be "100th of the target unit". At the end you just need to convert them back to a float and divide by 100.0 and you're back to what you want.
This is a slight misunderstanding. You cannot think of a float or double as being a decimal number.
Most any attempt to use it as a fixed decimal number of precision, say, 2, will incur problems as some values will not be precisely equal to xxx.xx but only approximately so.
One solution that many apps use is to ensure that:
1) display of floating point numbers is well controlled using printf/sprintf to a certain number of significant digits,
2) one does not do exact comparison between floating point numbers, i.e. to compare to the 2nd decimal point of precision two numbers a, b : abs(a-b) <= epsilon should generally be used. Outright equality is dangerous as 0.01 might have multiple floating point values, e.g. 0.0101 and 0.0103 might result if you do arithmetic, but be indistinguishable to the user if values are truncated to 2 dp, and they may be logically equivalent to your application which is assuming 2dp precision.
Lastly, I would suggest you use double instead of float. These days there is no real overhead as we aren't doing floating point without a maths coprocessor any more! And a float under 32-bit architectures has 7 decimal points of precision, and a double has 15, and this is enough to be significant in many case.
Rounding a float (that is, binary floating-point number) to 2 decimal digits doesn't make much sense because you won't be able to round it exactly in some cases anyway, so you'll still get a small delta which will affect subsequent calculations. If you really need it to be precisely 2 places, then you need to use decimal arithmetic; for example, using IBM's decNumber++ library, which implements ISO C/C++ TR 24773 draft
You can limit the number of significant numbers to output:
http://www.cplusplus.com/reference/iostream/manipulators/setprecision/
but I don't think there is a function to actually lop off a certain number of digits. You could write a function using ftoa() (or stringstream), lop off a certain number of digits, and use atof() (or stringstream) and return that.
You should checks the string rather than the converted float. It will be easier to check the number of digits.
Why don't you just round the floats to the desired precision?
double round(double val, int decimalPlaces)
{
double power_of_10 = pow(10.0, static_cast<double>(decimalPlaces));
return floor(val * power_of_10 + 0.5) / power_of_10;
}
int main()
{
double d;
cin >> d;
// round d to 3 decimal places...
d = round(d, 3);
// do something with d
d *= 1.75;
cout << setprecision(3) << d; // now output to 3 decimal places
}
There exist no fixed point decimal datatype in C, but you can mimic pascal's decimal with a struct of two ints.
If the need is to take 5 digits [ including or excluding the decimal point ], you could simply write like below.
scanf( "%5f", &a );
where a is declared as float.
Fo eg:
If you enter 123.45, scanf will consider the first 5 characters i.e., 4 digits and the decimal point & will store 123.4
If entered 123456, the value of a will be 12345 [ ~ 12345.00 ]
With printf, we would be able to control how many characters can be printed after decimal as well.
printf( "%5.2f \n", a );
The value of 123.4 will be printed as 12.30 [ total 5, including the decimal & 2 digits after decimal ]
But this have a limitation, where if the digits in the value are more than 5, it will display the actual value.
eg: The value of 123456.7, will be displayed as 123456.70.
This [ specifying the no. of digits after the decimal, as mentioned for printf ] I heard can be used for scanf as well, I am not sure sure & the compiler I use doesn't support that format. Verify whether your compiler does.
Now, when it comes to taking data from an external interface, are you talking about serialization here, I mean transmission of data on netwrok.
Then, to my knowledge your approach is fine.
We generally tend to read in the form of char only, to make sure the application works for any format of data.
You can print a float use with printf("%.2f", float), or something similar.