How write good round_double function in c++? - c++

I'm trying to write good round_double function which will round double in specified precision:
1.
double round_double(double num, int prec)
{
for (int i = 0; i < abs(prec); ++i)
if(prec > 0)
num *= 10.0;
else
num /= 10.0;
double result = (long long)floor(num + 0.5);
for (int i = 0; i < abs(prec); ++i)
if(prec > 0)
result /= 10.0;
else
result *= 10.0;
return result;
}
2.
double round_double(double num, int prec)
{
double tmp = pow(10.0, prec);
double result = (long long)floor(num * tmp + 0.5);
result /= tmp;
return result;
}
This functions do what I wan't but they are, in my opinion, not good enough. Because starting from precision = 13 - 14, they returning bad results.
The cause I'm sure that there is possible to write good double_round is that just printing the number via cout in specified precision (say 18) is prints better result than result of my function.
For example this part of code:
int prec = 18;
double num = 10.123456789987654321;
cout << setiosflags(ios::showpoint | ios::fixed)
<< setprecision(prec) << "round_double(" << num << ", "
<< prec << ") = " << round_double(num, prec) << endl;
Will print round_double(10.123456789987655000, 18) = -9.223372036854776500 for first round_double and round_double(10.123456789987655000, 18) = -9.223372036854776500for second one.
How write good round_double function in c++? Or there is already exists?

Don't cast to long long that is forcing a conversion to an integer with limited range, beyond what 10^13 requires (well 19 for 64-bit with no whole number part). Just calling floor should be enough.
double round_double(double num, int prec)
{
double tmp = pow(10.0, prec);
double result = floor(num * tmp + 0.5);
result /= tmp;
return result;
}
Note that Mike is also correct, you have a limited range you can represent just in double itself. It isn't so great if you need clean decimal responses. But the long long is the cause of your totally wacky numbers.

The problem is the floating-point representation. A binary representation cannot represent all decimal numbers exactly, and only has a finite precision.
double usually means a 64-bit binary representation as specified by IEEE754, with a 52-bit fractional part. This gives a precision of approximately 16 decimal digits.
If you need more precision than that, then the best option is probably to use an arbitrary-precision arithmetic library such as GMP. Your compiler may or may not offer a long double type with a higher precision than double.
EDIT: sorry, I didn't notice that you're getting completely incorrect results. As another answer says, this is due to the conversion to long long overflowing.

Another approach is to round based on binary-digits of precision. Sample implementation below - not sure if it's useful to you, but since you got me playing I thought I'd throw it out there.
Notes:
this uses the ieee754.h header common on Linux systems: it could easily be ported to Windows, but this is undeniably bit hackery and whether it's appropriate in any given production code is a case-by-case call.
you could approximate some decimal near-equivalent, e.g. multiply the desired decimal precision by 10 and divide by 3 (based on 2^10 ~= 10^3).
The input number (10.1234...) with 1 bit of precision is 8; with 2 it's 10 etc..
Separately, IMHO decimal rounding is best done at output time, or when using a decimal-capable representation (e.g. storing an int mantissa and power-of-10 exponent).
#include <ieee754.h>
#include <iostream>
#include <iomanip>
double round_double(double d, int precision)
{
ieee754_double* p = reinterpret_cast<ieee754_double*>(&d);
std::cout << "mantissa 0:" << std::hex << p->ieee.mantissa0
<< ", 1:" << p->ieee.mantissa1 << '\n';
unsigned mask0 = precision < 20 ? 0x000FFFFF << (20 - precision) :
0x000FFFFF;
unsigned mask1 = precision < 20 ? 0 :
precision == 53 ? 0xFFFFFFFF :
0xFFFFFFFE << (32 + 20 - precision);
std::cout << "masks 0:" << mask0 << ", 1: " << mask1 << '\n';
p->ieee.mantissa0 &= mask0;
p->ieee.mantissa1 &= mask1;
std::cout << "mantissa' 0:" << p->ieee.mantissa0
<< ", 1:" << p->ieee.mantissa1 << '\n';
return d;
}
int main()
{
double num = 10.123456789987654321;
for (int prec = 1; prec <= 53; ++prec)
std::cout << std::setiosflags(std::ios::showpoint | std::ios::fixed)
<< std::setprecision(60)
<< "round_double(" << num << ", " << prec << ") = "
<< round_double(num, prec) << std::endl;
}
Output...
mantissa 0:43f35, 1:ba76eea7
masks 0:fff80000, 1: 0
mantissa' 0:0, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 1) = 8.000000000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:fffc0000, 1: 0
mantissa' 0:40000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 2) = 10.000000000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:fffe0000, 1: 0
mantissa' 0:40000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 3) = 10.000000000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:ffff0000, 1: 0
mantissa' 0:40000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 4) = 10.000000000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:ffff8000, 1: 0
mantissa' 0:40000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 5) = 10.000000000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:ffffc000, 1: 0
mantissa' 0:40000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 6) = 10.000000000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:ffffe000, 1: 0
mantissa' 0:42000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 7) = 10.062500000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:fffff000, 1: 0
mantissa' 0:43000, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 8) = 10.093750000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:7ffff800, 1: 0
mantissa' 0:43800, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, 9) = 10.109375000000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:3ffffc00, 1: 0
mantissa' 0:43c00, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, a) = 10.117187500000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:1ffffe00, 1: 0
mantissa' 0:43e00, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, b) = 10.121093750000000000000000000000000000000000000000000000000000
mantissa 0:43f35, 1:ba76eea7
masks 0:fffff00, 1: 0
mantissa' 0:43f00, 1:0
round_double(10.123456789987654858009591407608240842819213867187500000000000, c) = 10.123046875000000000000000000000000000000000000000000000000000
etc....

Related

How can I prevent a number from being rounded when trading?

I am trying to get random numbers between 10 and 20 with decimals, first I take the integer part and then the decimal part, this so that the probabilities of the integer part have a greater possibility of varying, and then the decimal part, but when adding the two parts begins to be rounded, I don't want it to be rounded because I want to get an exact number of decimal places, in this case 7 decimal places.
#include<iostream>
#include<math.h>
double random_decimal(int inicio, int fin, int numero_decimales)
{
double int_part = (rand() % (inicio + 1 - fin) + inicio);
int num = pow(10, numero_decimales);
double decimal_part = (double)(rand() % (1 - num)) / num;
cout << "Int: "<< int_part << " Dec: "<< decimal_part << " Sum: "<< int_part + decimal_part << endl;
return (double)(int_part + decimal_part);
}
int main()
{
int ed;
double ps, estat;
for(int i = 0; i < 10; i++)
{
double x = random_decimal(10, 20, 7);
}
}
For starters, don't use rand() if you can use one of the C++11 random classes instead: https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful
For example:
std::default_random_engine generator;
std::uniform_real_distribution<double> distribution(10.0,20.0);
As far as "rounding" or "exact number of decimal places" - I wouldn't worry about that. Specifically:
https://peter.bloomfield.online/decimal-places-in-a-floating-point-number/
Significant figures != decimal places
It’s easy to make the mistake of thinking that the code above answers
our decimal places question. However, consider the following numbers:
0.12345123451234512345
0.00000000000000012345
Both of them are written with 20 decimal places. However, the second
one would only require 5 significant figures (or the equivalent of 5
decimal digits in the mantissa). The extra leading zeroes after the
decimal point can be represented by simply decreasing the exponent
(i.e. making it more negative), leaving the entire mantissa available
for precision.
Read the rest of the article for more details. But I suspect "rounding" (as I think you're using the term) probably isn't an issue for your particular application.
Your numbers are only being rounded when you are printing them:
$ cat que.cpp
#include <iomanip>
#include <iostream>
#include <math.h>
using namespace std;
double random_decimal(int inicio, int fin, int numero_decimales) {
double int_part = (rand() % (inicio + 1 - fin) + inicio);
int num = pow(10, numero_decimales);
double decimal_part = (double)(rand() % (1 - num)) / num;
cout << setprecision(numero_decimales) << fixed;
cout << "Int: " << int_part << " Dec: " << decimal_part
<< " Sum: " << int_part + decimal_part << endl;
return (double)(int_part + decimal_part);
}
int main() {
int ed;
double ps, estat;
for (int i = 0; i < 10; i++) {
double x = random_decimal(10, 20, 7);
}
}
$ g++ -o que que.cpp && ./que
Int: 11.0000000 Dec: 0.6930970 Sum: 11.6930970
Int: 10.0000000 Dec: 0.4637086 Sum: 10.4637086
Int: 15.0000000 Dec: 0.4238377 Sum: 15.4238377
Int: 11.0000000 Dec: 0.9760656 Sum: 11.9760656
Int: 16.0000000 Dec: 0.9641539 Sum: 16.9641539
Int: 15.0000000 Dec: 0.0490162 Sum: 15.0490162
Int: 15.0000000 Dec: 0.2520169 Sum: 15.2520169
Int: 15.0000000 Dec: 0.7514122 Sum: 15.7514122
Int: 16.0000000 Dec: 0.0383580 Sum: 16.0383580
Int: 17.0000000 Dec: 0.3455866 Sum: 17.3455866

Why do I get so precise floating-point number from std::cout?

The program
int main ()
{
long long ll = LLONG_MAX;
float f = ll;
std::cout << ll << '\n';
std::cout << std::fixed << f << '\n';
return 0;
}
gives:
9223372036854775807
9223372036854775808.000000
How is it possible? If 23-bit mantissa can have only 8,388,607 maximum value, why does cout output a 64-bit number?
You stored 2^63-1 in a float, which was rounded to 2^63 = 9223372036854775808. The powers of 2 are exactly representable.
The nearest number which is exactly representable is 2^63 + 2^40 = 9223373136366403584.
long long for you is a 64 bit data type so that means LLONG_MAX has a value of 2^63 - 1. You are right in that this can't be stored in a float which only has 23 bits of mantissa, but 2^63, which is one more than LLONG_MAX is easily stored in a float. It stores 2 in the mantissa and 63 in the exponent and there you have it.

I implemented my own square root function in c++ to get precision upto 9 points but it's not working

I want to get square root of a number upto 9 precision points so I did something like below but I am not getting correct precision. Here e is the precision which is greater than 10^9 then also ans is upto 5 precision points. What am I doing wrong here??
#include <iostream>
using namespace std;
long double squareRoot(long double n)
{
long double x = n;
long double y = 1;
long double e = 0.00000000000001;
while (x - y > e)
{
x = (x + y) / 2;
y = n / x;
}
cout << x << "\n";
return x;
}
int main()
{
int arr[] = {2,3,4,5,6};
int size = sizeof(arr)/sizeof(arr[0]);
long double ans = 0.0;
for(int i=0; i<size; i++)
{
ans += squareRoot(arr[i]);
}
cout << ans << "\n";
return 0;
}
The output is
1.41421
1.73205
2
2.23607
2.44949
9.83182
What should I do to get precision upto 9 points??
There are two places at which precision plays a role:
precision of the value itself
precision of the output stream
You can only get output in desired precision if both value and stream are precise enough.
In your case, the calculated value doesn't seem to be a problem, however, default stream precision is only five digits, i. e. no matter how precise your double value actually is, the stream will stop after five digits, rounding the last one appropriately. So you'll need to increase stream precision up to the desired nine digits:
std::cout << std::setprecision(9);
// or alternatively:
std::cout.precision(9);
Precision is kept until a new one is set, in contrast to e. g. std::setw, which only applies for next value.
try this
cout << setprecision(10) << x << "\n";
cout << setprecision(10) << ans << "\n";

How to convert Hex to IEEE 754 32 bit float in C++

I am trying to convert hex values stored as int and convert them to floatting point numbers using the IEEE 32 bit rules. I am specifically struggling with getting the right values for the mantissa and exponent. The hex is stored from in a file in hex. I want to have four significant figures to it. Below is my code.
float floatizeMe(unsigned int myNumba ) {
//// myNumba comes in as 32 bits or 8 byte
unsigned int sign = (myNumba & 0x007fffff) >>31;
unsigned int exponent = ((myNumba & 0x7f800000) >> 23)- 0x7F;
unsigned int mantissa = (myNumba & 0x007fffff) ;
float value = 0;
float mantissa2;
cout << endl<< "mantissa is : " << dec << mantissa << endl;
unsigned int m1 = mantissa & 0x00400000 >> 23;
unsigned int m2 = mantissa & 0x00200000 >> 22;
unsigned int m3 = mantissa & 0x00080000 >> 21;
unsigned int m4 = mantissa & 0x00040000 >> 20;
mantissa2 = m1 * (2 ^ -1) + m2*(2 ^ -2) + m3*(2 ^ -3) + m4*(2 ^ -4);
cout << "\nsign is: " << dec << sign << endl;
cout << "exponent is : " << dec << exponent << endl;
cout << "mantissa 2 is : " << dec << mantissa2 << endl;
// if above this number it is negative
if ( sign == 1)
sign = -1;
// if above this number it is positive
else {
sign = 1;
}
value = (-1^sign) * (1+mantissa2) * (2 ^ exponent);
cout << dec << "Float value is: " << value << "\n\n\n";
return value;
}
int main()
{
ifstream myfile("input.txt");
if (myfile.is_open())
{
unsigned int a, b,b1; // Hex
float c, d, e; // Dec
int choice;
unsigned int ex1 = 0;
unsigned int ex2 = 1;
myfile >> std::hex;
myfile >> a >> b ;
floatizeMe(a);
myfile.close();
return 0;
}
I suspect you mean for the ^ in
mantissa2 = m1 * (2 ^ -1) + m2*(2 ^ -2) + m3*(2 ^ -3) + m4*(2 ^ -4);
to mean "to the power of". There is no such operator in C or C++. The ^ operator is the bit-wise XOR operator.
Considering your CPU follows the IEEE standard, you can also use union. Something like this
union
{
int num;
float fnum;
} my_union;
Then store the integer values into my_union.num and read them as float by getting my_union.fnum.
We needed to convert IEEE-754 single and double precision numbers (using 32bit and 64bit encoding). We were using a C compiler (Vector CANoe/Canalyzer CAPL Script) with a restricted set of functions and ended up developing the function below (it can easily be tested using any on-line C compiler):
#include <stdio.h>
#include <math.h>
double ConvertNumberToFloat(unsigned long number, int isDoublePrecision)
{
int mantissaShift = isDoublePrecision ? 52 : 23;
unsigned long exponentMask = isDoublePrecision ? 0x7FF0000000000000 : 0x7f800000;
int bias = isDoublePrecision ? 1023 : 127;
int signShift = isDoublePrecision ? 63 : 31;
int sign = (number >> signShift) & 0x01;
int exponent = ((number & exponentMask) >> mantissaShift) - bias;
int power = -1;
double total = 0.0;
for ( int i = 0; i < mantissaShift; i++ )
{
int calc = (number >> (mantissaShift-i-1)) & 0x01;
total += calc * pow(2.0, power);
power--;
}
double value = (sign ? -1 : 1) * pow(2.0, exponent) * (total + 1.0);
return value;
}
int main()
{
// Single Precision
unsigned int singleValue = 0x40490FDB; // 3.141592...
float singlePrecision = (float)ConvertNumberToFloat(singleValue, 0);
printf("IEEE754 Single (from 32bit 0x%08X): %.7f\n",singleValue,singlePrecision);
// Double Precision
unsigned long doubleValue = 0x400921FB54442D18; // 3.141592653589793...
double doublePrecision = ConvertNumberToFloat(doubleValue, 1);
printf("IEEE754 Double (from 64bit 0x%016lX): %.16f\n",doubleValue,doublePrecision);
}
Just do the following (but of course make sure you have the right endianness when reading bytes into the integer in the first place):
float int_bits_to_float(int32_t ieee754_bits) {
float flt;
*((int*) &flt) = ieee754_bits;
return flt;
}
Works for me... this of course assumes that float has 32 bits, and is in IEEE754 format, on your architecture (which is almost always the case).
There are a number of very basic errors in your code.
The most visible is repeatedly using ^ for "power of". ^ is the XOR-operator, and for "power" you must use the function pow(base, exponent) in math.h.
Next, "I want to have four significant figures" (presumably for the mantissa), but you only extract four bits. Four bits can encode only 0..15, which is about a digit-and-a-half. To get four significant digits, you'd need at least log(10,000)/log(2) ≈ 13.288, or at least 14 bits (but preferably 17, so you get one full extra digit to get better rounding).
You extract the wrong bit for sign, and then you use it the wrong way. Yes, if it is 0 then sign = 1 and if 1 then sign = -1, but you use it in the final calculation as
value = (-1^sign) * ...
(again with a ^, although even pow does not make any sense here). You ought to have used sign * .. straight away.
exponent was declared an unsigned int, but that fails for negative values. It needs to be signed for pow(2, exponent) (corrected from your (2 ^ exponent)).
On the positive side, (1+mantissa2) is indeed correct.
With all of those points taken together, and ignoring the fact that you actually ask for only 4 significant digits, I get the following code. Note that I rearranged the initial bit shifting and extracting for convenience – I shift mantissa to the left, rather than the right, so I can test against 0 in its calculation.
(Ah, I missed this!) Using sign straight away does not work because it was declared as an unsigned int. Therefore, where you think you give it the value -1, it actually gets the value 4294967295 (more precise: the value of UINT_MAX from limits.h).
The easiest way to get rid of this is not multiplying by sign but only test it, and negate value if it is set.
float floatizeMe (unsigned int myNumba )
{
//// myNumba comes in as 32 bits or 8 byte
unsigned int sign = myNumba >>31;
signed int exponent = ((myNumba >> 23) & 0xff) - 0x7F;
unsigned int mantissa = myNumba << 9;
float value = 0;
float mantissa2;
cout << endl << "input is : " << hex << myNumba << endl;
cout << endl << "mantissa is : " << hex << mantissa << endl;
value = 0.5f;
mantissa2 = 0.0f;
while (mantissa)
{
if (mantissa & 0x80000000)
mantissa2 += value;
mantissa <<= 1;
value *= 0.5f;
}
cout << "\nsign is: " << sign << endl;
cout << "exponent is : " << hex << exponent << endl;
cout << "mantissa 2 is : " << mantissa2 << endl;
/* REMOVE:
if above this number it is negative
if ( sign == 1)
sign = -1;
// if above this number it is positive
else {
sign = 1;
} */
/* value = sign * (1.0f + mantissa2) * (pow (2, exponent)); */
value = (1.0f + mantissa2) * (pow (2, exponent));
if (sign) value = -value;
cout << dec << "Float value is: " << value << "\n\n\n";
return value;
}
With the above, you get correct results for values such as 0x3e4ccccd (0.2000000030) and 0x40490FDB (3.1415927410).
All said and done, if your input is already in IEEE-754 format (albeit in hex), then a simple cast ought to be enough.
As well as being much simpler, this also avoids any rounding/precision errors.
float value = reinterpret_cast<float&>(myNumba)
If you still want to inspect the parts separately, use the library function std::frexp afterwards. Of if you don't like the type punning, at least use std::ldexp to apply the exponent rather than your explicit maths, which is vulnerable to rounding/precision errors and overflow.
An alternate to both of these is to use a union type, as described in this answer.

C++ round a double up to 2 decimal places

I am having trouble rounding a GPA double to 2 decimal places. (ex of a GPA needed to be rounded: 3.67924) I am currently using ceil to round up, but it currently outputs it as a whole number (368)
here is what I have right now
if (cin >> gpa) {
if (gpa >= 0 && gpa <= 5) {
// valid number
gpa = ceil(gpa * 100);
break;
} else {
cout << "Please enter a valid GPA (0.00 - 5.00)" << endl;
cout << "GPA: ";
}
}
using the above code with 3.67924 would output 368 (which is what I want, but just without the period between the whole number and the decimals). How can I fix this?
To round a double up to 2 decimal places, you can use:
#include <iostream>
#include <cmath>
int main() {
double value = 0.123;
value = std::ceil(value * 100.0) / 100.0;
std::cout << value << std::endl; // prints 0.13
return 0;
}
To round up to n decimal places, you can use:
double round_up(double value, int decimal_places) {
const double multiplier = std::pow(10.0, decimal_places);
return std::ceil(value * multiplier) / multiplier;
}
This method won't be particularly fast, if performance becomes an issue you may need another solution.
If it is just a matter of writing to screen then to round the number use
std::cout.precision(3);
std::cout << gpa << std::endl;
see
floating points are not exactly represented so by internally rounding the value and then using that in your calculations you are increasing the inexactness.
When you are trying to store values upto n decimal values in a variable .
You have to multiple that value with 10^n and divide the same with 10^n.
Afterward use type operator to manipulate in the program.
Here is the example : -
float a,b,c,d,sum;
cin>>a>>b>>c>>d; // reading decimal values
sum=(a*b*c*d);
sum=round(sum*100)/100; // here it is for 2 decimal points
if((float)sum < (float) 9.58)
cout<<"YES\n";
else
cout<<"NO\n";
You can't round doubles to two decimal places. Doubles don't have decimal places. They have binary places, and they aren't commensurable with decimal places.
If you want decimal places, you must use a decimal radix, e.g. when formatting for output with printf("%.2f", ...).
Try this. But your cout statement in else condition, so it won't give the desired output for 3.67924.
if (cin >> gpa)
{
if (gpa >= 0 && gpa <= 5) {
// valid number
gpa = ceil(gpa * 100);
gpa=gpa/100;
break;
}
else
{
cout << "Please enter a valid GPA (0.00 - 5.00)" << endl;
cout << "GPA: ";
}
}
Example: you want 56.899999999999 to be output as a string with 2 decimal point which is 56.89.
First, convert them
value = 56.89 * 1000 = 5689
factor = 100
- 1 decimal point = 10
- 2 decimal point = 100
- 3 decimal point = 1000
etc
int integerValue;
int decimal;
std::string result;
function ( int value , int factor)
{
integerValue= (value / factor) * factor; //(5689 / 100) * 100 = 5600
decimal = value - integerValue; // 5689 - 5600;
result = std::to_string((int)(value/factor) + "." + std::to_string(decimal);
// result = "56" + "." + "89"
// lastly, print result
}
Not sure if this can help?
std::string precision_2(float number)
{
int decimal_part = (number * 100) - ((int)number * 100);
if (decimal_part > 10) {
return std::to_string((int)number) + "." + std::to_string(decimal_part);
} else {
return std::to_string((int)number) + ".0" + std::to_string(decimal_part);
}
}
Handles well for all positive floats. A minor modification will make it work for -ves as well.