How to do 32 digit decimal floating point number multiplication in C++? - c++

I have two numbers which are 32 digit decimal floating point numbers, like 1.2345678901234567890123456789012, I want to get the multiplication which is also 32 digit decimal floating point number. Is there any efficient way to do this?

Just use boost::multiprecision. You can use arbitrary precision but there is a typedef cpp_bin_float_50 which is a float with 50 decimal places.
Example for multiplying to big decimal numbers:
#include <iostream>
#include <boost/multiprecision/cpp_bin_float.hpp>
int main(){
boost::multiprecision::cpp_bin_float_50 val1("1.2345678901234567890123456789012");
boost::multiprecision::cpp_bin_float_50 val2("2.2345678901234567890123456789012");
std::cout << std::setprecision(std::numeric_limits< boost::multiprecision::cpp_bin_float_50>::max_digits10);
std::cout << val1*val2 << std::endl;
}
Output:
2.7587257654473404640618808351577828416864868162811293

Use the usual grade school algorithm (long multiplication). If you used 3 ints (instead of 4):
A2A1A0 * B2B1B0 = A2*B2 A2*B1 A2*B0
A1*B2 A1*B1 A1*B0
A0*B2 A0*B1 A0*B0
Every multiplication will have a 2-int result. You have to sum every column on the right side, and propagate carry. In the end, you'll have a 6-int result (if inputs are 4-int, then the result is 8-int). You can then round this 8-int result. This is how you can handle the mantissa part. The exponents should just be added together.
I recommend you to divide a problem into two parts:
multiplying a long number with an int
adding the result from 1. into the final result
You'll need something like this as a workhorse (note that this code assumes that int is 32-bit, long long is 64-bit):
void wideMul(unsigned int &hi, unsigned int &lo, unsigned int a, unsigned int b) {
unsigned long long int r = (unsigned long long int)a*b;
lo = (unsigned int)r;
hi = (unsigned int)(r>>32);
}
Note: that if you had larger numbers, there are faster algorithms.

Related

Efficient way of checking the length of a double in C++

Say I have a number, 100000, I can use some simple maths to check its size, i.e. log(100000) -> 5 (base 10 logarithm). Theres also another way of doing this, which is quite slow. std::string num = std::to_string(100000), num.size(). Is there an way to mathematically determine the length of a number? (not just 100000, but for things like 2313455, 123876132.. etc)
Why not use ceil? It rounds up to the nearest whole number - you can just wrap that around your log function, and add a check afterwards to catch the fact that a power of 10 would return 1 less than expected.
Here is a solution to the problem using single precision floating point numbers in O(1):
#include <cstdio>
#include <iostream>
#include <cstring>
int main(){
float x = 500; // to be converted
uint32_t f;
std::memcpy(&f, &x, sizeof(uint32_t)); // Convert float into a manageable int
uint8_t exp = (f & (0b11111111 << 23)) >> 23; // get the exponent
exp -= 127; // floating point bias
exp /= 3.32; // This will round but for this case it should be fine (ln2(10))
std::cout << std::to_string(exp) << std::endl;
}
For a number in scientific notation a*10^e this will return e (when 1<=a<10), so the length of the number (if it has an absolute value larger than 1), will be exp + 1.
For double precision this works, but you have to adapt it (bias is 1023 I think, and bit alignment is different. Check this)
This only works for floating point numbers, though so probably not very useful in this case. The efficiency in this case relative to the logarithm will also be determined by the speed at which int -> float conversion can occur.
Edit:
I just realised the question was about double. The modified result is:
int16_t getLength(double a){
uint64_t bits;
std::memcpy(&bits, &a, sizeof(uint64_t));
int16_t exp = (f >> 52) & 0b11111111111; // There is no 11 bit long int so this has to do
exp -= 1023;
exp /= 3.32;
return exp + 1;
}
There are some changes so that it behaves better (and also less shifting).
You can also use frexp() to get the exponent without bias.
If the number is whole, keep dividing by 10, until you're at 0. You'd have to divide 100000 6 times, for example. For the fractional part, you need to keep multiplying by 10 until trunc(f) == f.

Different output when dividing same numbers by 2 but as int and float

Why when i divide 1234567890 by 2 i get slightly different answer?
#include <iostream>
using namespace std;
int main()
{
float m;
int n;
cin>>n;
n/=2;
cout<<n<<endl;
cin>>m;
m/=2;
cout<<m;
}
float uses 24 bits for the mantissa (sort of:IEEE use 23 plus one implicit bit) while int probably uses 31 (or 63; the other bit is effectively a sign bit). The value bits for an exact representation need 30 bits (I think) so the float value is rounded.
This is due to integer vs regular division. When dividing two integers the result is also an int this caused when you do 5/2 to return 2. Floating point numbers and doubles do not have this problem. To bypass the int problem you can do (double)five/two using five and two as ints.

why double and long double are giving different answer in the following program

This code is calculating the Nth term of a series which is defined as
Tn+2=(Tn+1)^2+Tn, where 1st and 2nd terms are given as a and b in the code.
#include<iostream>
#include<string>
using namespace std;
int main()
{
int a,b,n;
char ch[100];
cin>>a>>b>>n;
long double res[3];
res[0]=a,res[1]=b;
for(int i=n-2;i>0;i--)
{
res[2]=res[1]*res[1]+res[0];
res[0]=res[1];
res[1]=res[2];
}
sprintf(ch,"%.0Lf",res[2]);
cout<<ch;
return 0;
}
Input: 0 1 10
Output:
84266613096281242861568 // in case of double res[3];
84266613096281243385856 // in case of long double res[3];
correct output : 84266613096281243382112
Since it is going out of the range of integer, therefore I am using double/long double.
But the problem is I am getting different output for double and long double, while none of the intermediate values are having non zero digit after decimal point, so there should not be any rounding off, I guess.
while none of the intermediate values are having non zero digit after decimal point, so there should not be any rounding off, I guess.
This assumption is just plain wrong. All floating point numbers like double etc. are stored like
mantissa * 2^exponent
with a finite number of bits for both the mantissa and the exponent. So floating point numbers can store a fixed number of significant digits (for a double converted to decimal representation, around 16 usually). If a number has more digits before the decimal point, rounding will happen and the total rounding error gets bigger the more digits you need to "forget".
If you want more details on this, the most common floating point implementations follow the IEEE floating point standard.

What if I try to assign values greater than pow(2,64)-1 to unsigned long long in c++?

If I have two unsigned long long values say pow(10,18) and pow (10,19) and I multiply them and store the output in another variable of type unsigned long long...the value which we get is obviously not the answer but does it have any logic? We get a junk type of value each time we try to this with arbitrarily large numbers, but do the outputs have any logic with the input values?
Unsigned integral types in C++ obey the rules of modular arithmetic, i.e. they represent the integers modulo 2N, where N is the number of value bits of the integral type (possibly less than its sizeof times CHAR_BIT); specifically, the type holds the values [0, 2N).
So when you multiply two numbers, the result is the remainder of the mathematical result divided by 2N.
The number N is obtainable programmatically via std::numeric_limits<T>::digits.
Yes, there's a logic.
As KerreK wrote, integers are "wrapped around" the 2N bits that constitute the width of their datatype.
To make it easy, let's consider the following:
#include <iostream>
#include <cmath>
using namespace std;
int main() {
unsigned char tot;
unsigned char ca = 200;
unsigned char ca2 = 200;
tot = ca * ca2;
cout << (int)tot;
return 0;
}
(try it: http://ideone.com/nWDYjO)
In the above example an unsigned char is 1 byte wide (max 255 decimal value), when multiplying 200 * 200 we get 40000. If we try to store it into the unsigned char it won't obviously fit.
The value is then "wrapped around", that is, tot gets the result of the following
(ca*ca2) % 256
where 256 are the bit of the unsigned char (1 byte), 28 bits
In your case you would get
(pow(10,18) * pow (10,19)) %
2number_of_bits_of_unsigned_long_long(architecture_dependent)

How to write an std::floor function from scratch [duplicate]

This question already has answers here:
Write your own implementation of math's floor function, C
(5 answers)
Closed 1 year ago.
I would like to know how to write my own floor function to round a float down.
Is it possible to do this by setting the bits of a float that represent the numbers after the comma to 0?
If yes, then how can I access and modify those bits?
Thanks.
You can do bit twiddling on floating point numbers, but getting it right depends on knowing exactly what the floating point binary representation is. For most machines these days its IEEE-754, which is reasonably straight-forward. For example IEEE-754 32-bit floats have 1 sign bit, 8 exponent bits, and 23 mantissa bits, so you can use shifts and masks to extract those fields and do things with them. So doing trunc (round to integer towards 0) is pretty easy:
float trunc(float x) {
union {
float f;
uint32_t i;
} val;
val.f = x;
int exponent = (val.i >> 23) & 0xff; // extract the exponent field;
int fractional_bits = 127 + 23 - exponent;
if (fractional_bits > 23) // abs(x) < 1.0
return 0.0;
if (fractional_bits > 0)
val.i &= ~((1U << fractional_bits) - 1);
return val.f;
}
First, we extract the exponent field, and use that to calculate how many bits after the
decimal point are present in the number. If there are more than the size of the mantissa, then we just return 0. Otherwise, if there's at least 1, we mask off (clear) that many low bits. Pretty simple. We're ignoring denormal, NaN, and infinity her, but that works out ok, as they have exponents of all 0s or all 1s, which means we end up converting denorms to 0 (they get caught in the first if, along with small normal numbers), and leaving NaN/Inf unchanged.
To do a floor, you'd also need to look at the sign, and rounds negative numbers 'up' towards negative infinity.
Note that this is almost certainly slower than using dedicated floating point intructions, so this sort of thing is really only useful if you need to use floating point numbers on hardware that has no native floating point support. Or if you just want to play around and learn how these things work at a low level.
Define from scratch. And no, setting the bits of your floating point number representing the numbers after the comma to 0 will not work. If you look at IEEE-754, you will see that you basically have all your floating-point numbers in the form:
0.xyzxyzxyz 2^(abc)
So to implement flooring, you can get the xyzxyzxyz and shift left by abc+1 times. Drop the rest. I suggest you read up on the binary representation of a floating point number (link above), this should shed light on the solution I suggested.
NOTE: You also need to take care of the sign bit. And the mantissa of your number is off by 127.
Here is an example, Let's say you have the number pi: 3.14..., you want to get 3.
Pi is represented in binary as
0 10000000 10010010000111111011011
This translate to
sign = 0 ; e = 1 ; s = 110010010000111111011011
The above I get directly from Wikipedia. Since e is 1. You will want to shift left s by 1 + 1 = 2, so you get 11 => 3.
#include <iostream>
#include <iomanip>
double round(double input, double roundto) {
return int(input / roundto) * roundto;
}
int main() {
double pi = 3.1415926353898;
double almostpi = round(pi, 0.0001);
std::cout << std::setprecision(14) << pi << '\n' << std::setprecision(14) << almostpi;
}
http://ideone.com/mdqFA
output:
3.1415926353898
3.1415
This will pretty much be faster than any bit twiddling you can come up with. And it works on all computers (with floats) instead of just one type.
Casting to unsigned while returning as a double does what you are seeking, but under the hood. This simple piece of code works for any POSITIVE number.
#include <iostream>
double floor(const double& num) {
return (unsigned long long) num;
}
This has been tested on tio.run (Try It Online) and onlinegdb.com. The function itself doesn't require any #include files, but to print out the answers, I have included stdio.h (in the tio.run and onlinegdb.com, not here). Here it is:
long double myFloor(long double x) /* Change this to your liking: long double might
be float in your situation. */
{
long double xcopy=x<0?x*-1:x;
unsigned int zeros=0;
long double n=1;
for(n=1;xcopy>n*10;n*=10,++zeros);
for(xcopy-=n;zeros!=-1;xcopy-=n)
if(xcopy<0)
{
xcopy+=n;
n/=10;
--zeros;
}
xcopy+=n;
return x<0?(xcopy==0?x:x-(1-xcopy)):(x-xcopy);
}
This function works everywhere (pretty sure) because it just removes all of the non-decimal parts instead of trying to work with the parts of floats.
The floor of a floating point number is the biggest integer less than or equal to it. Here are a some examples:
floor(5.7) = 5
floor(3) = 3
floor(9.9) = 9
floor(7.0) = 7
floor(-7.9) = -8
floor(-5.0) = -5
floor(-3.3) = -3
floor(0) = 0
floor(-0.0) = -0
floor(-0) = -0
Note: this is almost an exact copy from my other answer which answered a question that was basically the same as this one.