C++ significant figures - c++

How can I do math involving significant figures in C++? I want this to work correct with measured data from chemistry and physics experiments. An example: 65 / 5 = 10. I would need to get rid of unneeded decimal places and replace some digits with 0s.
Thanks!

This should get you what you need:
std::cout.precision(x); // x would be the number of significant figures to output

This may not be the most efficient way, but you can create a custom sig fig data type.
class SigFigFloat
{
SigFigFloat(vector<short> digits, int decimalIndex, bool negative);
SigFigFloat operator+(const SigFigFloat &value);
SigFigFloat operator-(const SigFigFloat &value);
//etc...
}
It can be a lot of work, but if you implement this right, it can be a really flexible way to represent and do calculations with sig figs.

It is hard because significant figures are a decimal concept, and computers speak binary. You can use decimal number classes (I don't know of any), or use boost::interval, which is the closest to what you certainly want to achieve.

That depends on how you are displaying them. If you are using the printf-family, you set the precision (sprintf(buffer, "%.2f", myfloat)). If you are using ostreams, you call the precision function to set the number of decimal places. If you are looking for the more scientific method of sig figs, you'll have to write a custom function that determines the precision based on the current value of the float.

You can also:
#define SIGNIFICANT_DIGITS 3
const float SIGNIFICANT_DIGITS_PWR = powf(10.0f, SIGNIFICANT_DIGITS);
float f;
std::cin >> f;
int int_digits = (int)log10f(f) + 1;
if (int_digits > 1) {
float prod = SIGNIFICANT_DIGITS_PWR / powf(10.0f, int_digits);
f = (float)(int)(f * prod) / prod;
} else {
f = (float)((int)(f * SIGNIFICANT_DIGITS_PWR)) / SIGNIFICANT_DIGITS_PWR;
};
std::cout << f << '\n';
Output:
0.1234
> 0.123
12.34
> 12.3
1234
> 1230

here is a quick C++11 solution that worked for me:
int sig_figs = 3;
double number = 1562.654478;
std::cout << "original number:" << number << std::endl;
number = ([number](int number_of_sig_figs)->double{
std::stringstream lStream;
lStream << std::setprecision(number_of_sig_figs) << number;
return std::stod(lStream.str());
})(sig_figs);
std::cout << "rounded number:" << number << std::endl;

Well there are good math libraries in math.h
Also storing your figures in floats, doubles or long doubles will allow for more precise operations.
Floats offer 7 significant digits while doubles offer 16 significant digits.
source
Also when printing out usually people use _snprintf or printf and you can format those doubles, floats to the precision you want like:
Float Precision
printf("Value %8.2f", floatVariable);
This says you require a total
field of 8 characters, within the 8
characters the last 2 will hold the
decimal part.
_snprintf(buffer, sizeof(buffer), "Value %.2f", floatVariable);
The example above
requests the minimum field width and
the last two characters are to hold
the decimal part.

Related

Higher precision when parsing string to float

This is my first post here so sorry if it drags a little.
I'm assisting in some research for my professor, and I'm having some trouble with precision when I'm parsing some numbers that need to be precise to the 12th decimal point. For example, here is a number that I'm parsing from a string into an integer, before it's parsed:
-82.636097527336
Here is the code I'm using to parse it, which I also found on this site (thanks for that!):
std::basic_string<char> str = prelim[i];
std::stringstream s_str( str );
float val;
s_str >> val;
degrees.push_back(val);
Where 'prelim[i]' is just the current number I'm on, and 'degrees' is my new vector that holds all of the numbers after they've been parsed to a float. My issue is that, after it's parsed and stored in 'degrees', I do an 'std::cout' command comparing both values side-by-side, and shows up like this (old value (string) on the left, new value (float) on the right):
-82.6361
Does anyone have any insight into how I could alleviate this issue and make my numbers more precise? I suppose I could go character by character and use a switch case, but I think that there's an easier way to do it with just a few lines of code.
Again, thank you in advance and any pointers would be appreciated!
(Edited for clarity regarding how I was outputting the value)
Change to a double to represent the value more accurately, and use std::setprecision(30) or more to show as much of the internal representation as is available.
Note that the internal storage isn't exact; using an Intel Core i7, I got the following values:
string: -82.636097527336
float: -82.63610076904296875
double: -82.63609752733600544161163270473480224609
So, as you can see, double correctly represents all of the digits of your original input string, but even so, it isn't quite exact, since there are a few extra digits than in your string.
There are two problems:
A 32-bit float does not have enough precision for 14 decimal digits. From a 32-bit float you can get about 7 decimal digits, because it has a 23-bit binary mantissa. A 64-bit float (double) has 52 bits of mantissa, which gives you about 16 decimal digits, just enough.
Printing with cout by default prints six decimal digits.
Here is a little program to illustrate the difference:
#include <iomanip>
#include <iostream>
#include <sstream>
int main(int, const char**)
{
float parsed_float;
double parsed_double;
std::stringstream input("-82.636097527336 -82.636097527336");
input >> parsed_float;
input >> parsed_double;
std::cout << "float printed with default precision: "
<< parsed_float << std::endl;
std::cout << "double printed with default precision: "
<< parsed_double << std::endl;
std::cout << "float printed with 14 digits precision: "
<< std::setprecision(14) << parsed_float << std::endl;
std::cout << "double printed with 14 digits precision: "
<< std::setprecision(14) << parsed_double << std::endl;
return 0;
}
Output:
float printed with default precision: -82.6361
double printed with default precision: -82.6361
float printed with 14 digits precision: -82.636100769043
double printed with 14 digits precision: -82.636097527336
So you need to use a 64-bit float to be able to represent the input, but also remember to print with the desired precision with std::setprecision.
You cannot have precision up to the 12th decimal using a simple float. The intuitive course of action would be to use double or long double... but your are not going to have the precision your need.
The reason is due to the representation of real numbers in memory. You have more information here.
For example. 0.02 is actually stored as 0.01999999...
You should use a dedicated library for arbitrary precision, instead.
Hope this helps.

Why can't I see all the significant digits when displaying vector<long double>?

I have a console app with a function that divides integers of a Fibonacci series, demonstrating how the ratio in any Fibonacci series approaches Φ . I have simliar code written in Go and inC++11. InGo (or a scientific calculator), the function returns values of int64 and the results show a precision of up to 16 digits in an Ubuntu Terminal Session, for example:
1.6180339937902115
In C++11 I can never see more that 5 digits of precision in the results usingcout. The results are declared aslong double in a function like this:
typedef unsigned long long int ULInt;
typedef std::vector< ULInt> ULIntV;
std::vector<long double > CalcSequenceRatio( const ULIntV& fib )
{
std::vector<long double> result;
for ( int i = 0; i != fib.size( ); i ++ )
{
if ( i == ( fib.size( ) - 1 ) )
{
result[i] = 0;
break;
}
long double n = fib[i + 1];
long double n2 = fib[i];
long double q = n / n2;
result.push_back( q );
}
return result;
}
Although the vectorfib passed into CalcSequenceRatio( const ULIntV& fib ) contains over 100 entries, after 16 entries, all values in the result set are displayed as
1.61803
The rest of the value is being rounded although in Go (or in a calculator), I can see that the actual values are extended to at least 16 digits of precision.
How can I make CalcSequenceRatio() return more precise values? Is there is problem because going from long long int to long double is a downcast? Do I need to pass the fib series as vector<long double>? What's wrong?
Edit:
This question has been marked a duplicate, but this is not really correct, because the question does not deal directly with cout: There are other factors that might have made a difference, although the analysis proves that cout is the problem. I posted the correct answer:
The problem is with cout, and here is the solution... as explained in
the other question...
It sounds like you want to use: std::numeric_limits<T>::max_digits10 for distinct, 'round-trip' conversions - in conjunction with std::setprecision.
e.g., for float this is typically (9) => or 1.8 format. double is typically (17) => 1.16
A long double is typically implemented as an 80 bit extended precision type on x86, or a 128 bit quad precision type, with (21) => 1.20 and (36) => 1.35 formats respectively. However the long double is only required to provide at least as much precision as a double.
There's a good series of notes on related subjects here.
The problem here is withstd::cout.
I fixed it using std::setprecision(50), as explained in How do I print a double value with full precision using cout? That shows me values like this:
1.6180339887498948482072100296669248109537875279784
To make it flexible, I gave the user the option to enter the desired level of precision:
void printGolden( const std::vector<long double>& golden )
{
cout << "Enter desired precision:" << endl;
int precision{};
cin >> precision;
std::cout << std::setprecision( precision );
for ( auto i : golden )
{
std::cout << i << "; ";
}
}

Losing Double Precision when multiplying by multiple of 10 [duplicate]

This question already has answers here:
Precision loss with double C++
(4 answers)
Closed 9 years ago.
So I have the following code
int main(){
double d;
cin>>d;
while(d!=0.00)
{
cout<<d<<endl;
double m = 100*d;
int n = m;
cout<<n<<endl;
cin>>d;
}
return 0;}
When I enter the input 20.40 for d the value of n comes out to be 2039 instead of 2040.
I tried replacing int n = m with int n = (int) m but the result was the same.
Is there any way to fix this. Thanks in advance.
Your code truncates m but you need rounding. Include cmath and use int n = round(m).
Decimal values can, in general, not be represented exactly using binary floating points like double. Thus, the value 20.40 is represented as an approximation which can be used to restore the original value (20.4; the precision cannot be retained), e.g., when formatting the value. Doing computations with these approximated values will typically amplify the error.
As already mentioned in one of the comments, the relevant reference is the paper "What Every Computer Scientist Should Know About Floating-Point Arithmetic". One potential way out of your trouble is to use decimal floating points which are, however, not yet part of the C++ standard.
Single and double presicion floating point numbers are not stored the same way as integers, so whole numbers (e.g. 5, 10) may actually look like long decimals (e.g. 4.9999001, 10.000000001). When you cast to an int, all it does is truncate the whole number. So, if the number is currently represented as 4.999999999, casting it to an int will give you 4. std::round will provide you with a better result most of the time (if the number is 4.6 and you just want the whole number portion, round will not work well). The bigger question is then: what are you hoping to accomplish by casting a double to an int?
In general, when dealing with floating point numbers, you will want to use some epsilon value that is your minimum significant digits. So if you wanted to compare 4.9999999 to 5, you would do (pseudo-code): if abs(5 - 4.9999999) < epsilon, return 5.
Example
int main()
{
double d;
std::cin >> d;
while (std::fabs(d - 0.0) > DBL_EPSILON)
{
std::cout << d << std::endl;
double m = 100 * d;
int n = static_cast<int>(m);
if (std::fabs(static_cast<double>(n) - m) > DBL_EPSILON)
{
n++;
}
std::cout << n << std::endl;
std::cin >> d;
}
return 0;
}
Casting double to int truncates value so 20.40 is probably 20.399999 * 100 is 2039.99 because double is not base 10. You can use round() function that will not truncate but will get you nearest int.
int n = round(m);
Floating point numbers can't exactly represent all decimal numbers, sometimes an approximation is used. In your example the closest possible exact number is 20.39999999999999857891452847979962825775146484375. See IEEE-754 Analysis for a quick way to see exact values.
You can use rounding, but presumably you're really looking for the first two digits truncated. Just add a really small value, e.g. 0.0000000001 before or after you multiply.

How to set floating point precision inside a variable

I am currently working a program where I need to calculate a rounded value to only 2 digits after a floating point.
Say, I have declared
float a;
If a = 3.555 then it would store a = 3.56, rounding up.
For a = 3.423, the value of a would be a = 3.423, no change.
I can do this to print output, but what I need to do when storing it into a variable and use that variable for some other calculation?
If you need two digits after the decimal point, don't use floating point. Use a fixed point number instead. For example, just use an integer that's 100 times larger than the decimal number you want to represent. Trying to fit a base 2 floating point number into rounding rules like this just isn't going to produce satisfactory results for you.
double d = 5000.23423;
d = ceil(d*100)/100;
cout << d << endl; // prints : 5000.24
double e = 5000.23423;
e = floor(e*100)/100;
cout << e << endl; // prints : 5000.23
You can do this:
a = roundf(a*100)/100;
How about
#include <math.h>
int main ()
{
double a, f, i;
a = 3.436;
f= modf(a, &i);
a = i + roundf(f* 100.0) / 100.0;
return 0;
}
Operates on doubles but avoids scaling the whole number.
Update: Added the missing division.

Maximum Width of a Printed Double in C++

I was wondering, how long in number of characters would the longest a double printed using fprintf be? My guess is wrong.
Thanks in advance.
Twelve would be a bit of an underestimate. On my machine, the following results in a 317 character long string:
#include <limits>
#include <cstdio>
#include <cstring>
int main()
{
double d = -std::numeric_limits<double>::max();
char str[2048] = "";
std::sprintf(str, "%f", d);
std::size_t length = std::strlen(str);
}
Using %e results in a 14 character long string.
Who knows. The Standard doesn't say how many digits of precision a double provides other than saying it (3.9.1.8) "provides at least as much precision as float," so you don't really know how many characters you'll need to sprintf an arbitrary value. Even if you did know how many digits your implementation provided, there's still the question of exponential formatting, etc.
But there's a MUCH bigger question here. Why the heck would you care? I'm guessing it's because you're trying to write something like this:
double d = ...;
int MAGIC_NUMBER = ...;
char buffer[MAGIC_NUMBER];
sprintf(buffer, "%f", d);
This is a bad way to do this, precisely because you don't know how big MAGIC_NUMBER should be. You can pick something that should be big enough, like 14 or 128k, but then the number you picked is arbitrary, not based on anything but a guess that it will be big enough. Numbers like MAGIC_NUMBER are, not suprisingly, called Magic Numbers. Stay away from them. They will make you cry one day.
Instead, there's a lot of ways to do this string formatting without having to care about buffer sizes, digits of precision, etc, that let you just get on with the buisness of programming. Streams is one:
#include <sstream>
double d = ...;
stringstream ss;
ss << d;
string s = ss.str();
cout << s;
...Boost.Format is another:
#include <boost\format\format.hpp>
double d = ... ;
string s = (boost::format("%1%") % d).str();
cout << s;
Its defined in limits:
std::cout << std::numeric_limits<double>::digits << "\n";
std::cout << std::numeric_limits<double>::digits10 << "\n";
Definition:
digits: number of digits (in radix base) in the mantissa
Equivalent to FLT_MANT_DIG, DBL_MANT_DIG or LDBL_MANT_DIG.
digits10: Number of digits (in decimal base) that can be represented without change.
Equivalent to FLT_DIG, DBL_DIG or LDBL_DIG for floating types.
See: http://www.cplusplus.com/reference/std/limits/numeric_limits/
Of course when you print stuff to a stream you can use the stream manipulators to limit the size of the output.
you can decide it by yourself..
double a=1.1111111111111111111111111111111111111111111111111;
printf("%1.15lf\n", a);
return 0;
./a.out
1.111111111111111
you can print more than 12 characters..
If your machine uses IEEE754 doubles (which is fairly widespread now), then the binary precision is 53 bits; The decimal equivalent is approximately 15.95 (calculated via logarithmic conversion), so you can usually rely on 15 decimal digits of precision.
Consult Double precision floating-point format for a brief discussion.
For a much more in-depth study, the canonical paper is What Every Computer Scientist Should Know About Floating-Point Arithmetic. It gets cited here whenever binary floating point discussions pop up, and is worth a weekend of careful reading.