Why round() make my expression give wrong answer? [duplicate] - c++

This question already has answers here:
Cast from unsigned long long to double and vice versa changes the value
(3 answers)
Closed 1 year ago.
I'm currently facing a problem but I don't know why it wrong?
// ll ís long long
ll cnt = 24822089714520516;
cout << "xpow: " << xpow(10LL, 16) << endl;
cout << "cnt: " << cnt << endl;
ll a = xpow(10LL, 16) + cnt - 1;
ll b = round(xpow(10LL, 16)) + cnt - 1;
cout << "cur_num (without round): " << a << endl;
cout << "cur_num (with round): " << b << endl;
with xpow is defined by myself:
ll xpow(ll a, ll b) {
ll ans = 1;
while (b) {
if (b & 1)
ans *= a;
b >>= 1;
if (b)
a *= a;
};
return ans;
};
When I run my code, I get this log:
xpow: 10000000000000000
cnt: 24822089714520516
cur_num (without round): 34822089714520515
cur_num (with round): 34822089714520516
As you see, my result if I use round is differenced with one when I don't round (smaller than 1 unit)
It may be my computer's problem, but may be no. Can anyone explain why?
Thanks so much!

In many C++ implementations, the long long type is 64 bits long, and the double type is 64 bits long. When this happens, a variable of type double cannot exactly represent all possible long long values, and in particular large long long values might be approximated incorrectly by a double.
Here, round converts your long long to a double, introducing some imprecision.

Related

Unusal behaviour of sum in C++?

I'm writing a method to check if a number is palindrome or not. For example 12321 is palindrome and 98765 is not.
In my program I've used a recursive function to create exactly opposite of given number, like 56789 for 98765 and then checking if two numbers are equal or not. But I'm not getting exact opposite of 98765 which is 56789 instead I'm getting 56787.
Here's my code-
#include<iostream>
#include<bits/stdc++.h>
using namespace std;
long int oppositeNum(int n){
if(n<10 && n>=0) return n;
if(n<0) return 0;
static int m=0;
int x = n%10;
long int num = oppositeNum(n/10);
cout << num << "\n";
return (num+ (x*pow(10,++m)));
}
int main(){
int n = 98765;
int oppNum = oppositeNum(n);
cout << oppNum;
if(oppNum==n){
cout << "Number is palindrome";
}else{
cout << "Number is not palindrome";
}
return 0;
}
I'm not getting the exact opposite of my original nnumber. the last digit is getting decremented by 1 every time is what I've observed.
Can anyone help?
I can not reproduce the result you are getting. Maybe it is a consequence of using the function pow
But in any case your function may not be called the second time for a different number because the static variable m is not reinitialized to 0. m continues to keep the value after the previous call of the function for a different number.
You can write the function without using the function pow.
Take into account that a reversed number can be too large to be stored an object of the type long because in some systems the type long has the same width as the type int. So I adjust instead of the type long to use the type long long as the return type.
Here you are.
#include <iostream>
long long int oppositeNum( int n )
{
static long long int multiplier = 1;
const int Base = 10;
int digit = n % Base;
return ( n /= Base ) == 0
? ( multiplier = 1, digit )
: ( n = oppositeNum( n ) , multiplier *= Base, digit * multiplier + n );
}
int main()
{
int n = 12321;
std::cout << n << " -> " << oppositeNum( n ) << '\n';
n = - 12321;
std::cout << n << " -> " << oppositeNum( n ) << '\n';
n = 98765;
std::cout << n << " -> " << oppositeNum( n ) << '\n';
n = -98765;
std::cout << n << " -> " << oppositeNum( n ) << '\n';
return 0;
}
The program output is
12321 -> 12321
-12321 -> -12321
98765 -> 56789
-98765 -> -56789
I changed only as much as it seemed neccessary. It still could use some tail recursion but this would probably required rewriting the whole algorithm.
long int oppositeNum(int n, int &m){
if(n<10) return n;
int x = n%10;
long int num = oppositeNum(n/10, m);
cout << num << "\n";
m *= 10;
return num + x * (long int)m;
}
long int oppositeNum(int n){
if(n<0) return 0;
int m = 1;
return oppositeNum(n, m);
}
What did I change:
Removed the static modifier from m and instead passing it by reference. This allows to use the function more than one time during the program execution. (Also it would allow to use the function by multiple threads at once but I guess this is not a concern here.)
Removed the floating point function pow and instead just multiplying the variable m by ten each iteration.
Added a wrapper for the recursive function so it still can be called with just one argument. Additionally this allow to check for negative numbers only once.
The main source of problem was the function pow. Because it works on floating point numbers, it may not give exact results. It depends on the compiler and the processor architecture but generally you shouldn't expect it to give an exact result. Rounding it to integer additionally increases the difference.
return (num + float(x*pow(10,++m)));
just do this and it will work. the ans is wrong cos the pow func is giving out 4999 and 5999 that is why it changes the units place.
Try this:
return round(num+ (x*pow(10,++m)));
Sometimes the pow function will return the approximated result. e.g. pow(10, 5) could be 99999.9999999, if you round it, you'll get 100000 else it takes the floor value (I think).

Why doesn't the console output the right answer?

I have 2 numbers, n = 1000000000000, and j = 1. When I write
cout << n / j << endl;
The console outputs the right answer, 1000000000000.
However, when I do :
int d = n / j;
cout << d << endl;
The console outputs 3567587328.
Can someone please explain why this happens and what should I do?
The value you are using is greater than the maximum value that an integer variable can store.
If you need to perform arithmetic operation with such big numbers then you will have to use special classes that handle such numbers. Perhaps your implementation of C++ supports the long long data type?
The max int you can have is 2,147,483,647 which is way smaller that 10^12 that you have there, so you have an integer overflow. Instead of an int you could use long long.
Try using long.
long long d = n / j;
cout << d << endl;

Am I doing double to float conversion here

const double dBLEPTable_8_BLKHAR[4096] = {
0.00000000000000000000000000000000,
-0.00000000239150987901837200000000,
-0.00000000956897738824125100000000,
-0.00000002153888378764179400000000,
-0.00000003830892270073604800000000,
-0.00000005988800189093979000000000,
-0.00000008628624126316708500000000,
-0.00000011751498329992671000000000,
-0.00000015358678995269770000000000,
-0.00000019451544774895524000000000,
-0.00000024031597312124120000000000,
-0.00000029100459975062165000000000
}
If I change the double above to float, am I doing incurring conversion cpu cycles when I perform operations on the array contents? Or is the "conversion" sorted out during compile time?
Say, dBLEPTable_8_BLKHAR[1] + dBLEPTable_8_BLKHAR[2] , something simple like this?
On a related note, how many trailing decimal places should a float be able to store?
This is c++.
Any good compiler will convert the initializers during compile time. However, you also asked
am I incurring conversion cpu cycles when I perform operations on the array contents?
and that depends on the code performing the operations. If your expression combines array elements with variables of double type, then the operation will be performed at double precision, and the array elements will be promoted (converted) before the arithmetic takes place.
If you just combine array elements with variables of float type (including other array elements), then the operation is performed on floats and the language doesn't require any promotion (But if your hardware only implements double precision operations, conversion might still be done. Such hardware surely makes the conversions very cheap, though.)
Ben Voigt answer addresses your question for most parts.
But you also ask:
On a related note, how many trailing decimal places should a float be able to store
It depends on the value of the number you are trying to store. For large numbers there is no decimals - in fact the format can't even give you a precise value for the integer part. For instance:
float x = BIG_NUMBER;
float y = x + 1;
if (x == y)
{
// The code get here if BIG_NUMBER is very high!
}
else
{
// The code get here if BIG_NUMBER is no so high!
}
If BIG_NUMBER is 2^23 the next greater number would be (2^23 + 1).
If BIG_NUMBER is 2^24 the next greater number would be (2^24 + 2).
The value (2^24 + 1) can not be stored.
For very small numbers (i.e. close to zero), you will have a lot of decimal places.
Floating point is to be used with great care because they are very imprecise.
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
For small numbers you can experiment with the program below.
Change the exp variable to set the starting point. The program will show you what the step size is for the range and the first four valid numbers.
int main (int argc, char* argv[])
{
int exp = -27; // <--- !!!!!!!!!!!
// Change this to set starting point for the range
// Starting point will be 2 ^ exp
float f;
unsigned int *d = (unsigned int *)&f; // Brute force to set f in binary format
unsigned int e;
cout.precision(100);
// Calculate step size for this range
e = ((127-23) + exp) << 23;
*d = e;
cout << "Step size = " << fixed << f << endl;
cout << "First 4 numbers in range:" << endl;
// Calculate first four valid numbers in this range
e = (127 + exp) << 23;
*d = e | 0x00000000;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000001;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000002;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000003;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
return 0;
}
For exp = -27 the output will be:
Step size = 0.0000000000000008881784197001252323389053344726562500000000000000000000000000000000000000000000000000
First 4 numbers in range:
0x32000000 = 0.0000000074505805969238281250000000000000000000000000000000000000000000000000000000000000000000000000
0x32000001 = 0.0000000074505814851022478251252323389053344726562500000000000000000000000000000000000000000000000000
0x32000002 = 0.0000000074505823732806675252504646778106689453125000000000000000000000000000000000000000000000000000
0x32000003 = 0.0000000074505832614590872253756970167160034179687500000000000000000000000000000000000000000000000000
const double dBLEPTable_8_BLKHAR[4096] = {
If you change the double in that line to float, then one of two things will happen:
At compile time, the compiler will convert the numbers -0.00000000239150987901837200000000 to the float that best represents them, and will then store that data directly into the array.
At runtime, during the program initialization (before main() is called!) the runtime that the compiler generated will fill that array with data of type float.
Either way, once you get to main() and to code that you've written, all of that data will be stored as float variables.

for loop help. Terminates when it isnt supposed to. c++

I'm new to stackoverflow, but i did try to look for an answer and could not find it. I also can't seem to figure it out myself. So for a school C++ project, we need to find the area under a curve. I have all the formulas hardcoded in, so don't worry about that. And so the program is supposed to give a higher precision answer with a higher value for (n). But it seems that when I put a value for (n) thats higher than (b), the program just loops a 0 and does not terminate. Could you guys help me please. Thank you. Heres the code:
/* David */
#include <iostream>
using namespace std;
int main()
{
cout << "Please Enter Lower Limit: " << endl;
int a;
cin >> a;
cout << "Please Enter Upper Limit: " << endl;
int b;
cin >> b;
cout << "Please Enter Sub Intervals: " << endl;
int n;
cin >> n;
double Dx = (b - a) / n;
double A = 0;
double X = a;
for (X = a; X <= (b - Dx); X += Dx)
{
A = A + (X*X*Dx);
X = X * Dx;
cout << A << endl;
}
cout << "The area under the curve is: " << A << endl;
return 0;
}
a, b, n are integers. So the following:
(b - a) / n
is probably 0. You can replace it with:
double(b - a) / n
Since all the variables in (b - a) / n are int, you're doing integer division, which discards fractions in the result. Assigning to a double doesn't change this.
You should convert at least one of the variables to double so that you'll get a floating point result with the fractions retained:
double Dx = (b - a) / (double)n;
The other answers are correct. Your problem is probably integer division. You have to cast on of the operands to double.
But you should use static_cast<> instead of C-style casts. Namely use
static_cast<double>(b - a) / n
instead of double(b - a) / n or ((double) (b - a)) / n.
You are performing integer division. Integer division will only return whole numbers by cutting off the decimal:
3/2 == 1 //Because 1.5 will get cut to 1
3/3 == 1
3/4 == 0 //Because 0.5 will get cut to 0
You need to have at least one of the two values on the left or right of the "/" be a decimal type.
3 / 2.0f == 1.5f
3.0f / 2 == 1.5f
3.0f / 2.0f == 1.5f

Bit shifts in c++

I don't understand why this gives me the same answer:
long long a = 3265917058 >> 24;
std::cout << a << std::endl; //194
long long ip = 3265917058;
long long b = ip >> 24;
std::cout << b << std::endl; //194
but this don't:
long long a = (3265917058 << 16) >> 24;
std::cout << a << std::endl; //240
long long ip = 3265917058;
long long b = (ip << 16) >> 24;
std::cout << b << std::endl; //12757488 - **i want this to be 240 too!**
Update: I want 32bit shift , but how can i 32bit shift a number that is too large for an int variable?
Update2: My answer is to make unsigned int ip. Then everything will be ok.
Your literal constant 3265917058 is an int. Add a LL suffix to get the expected behavio(u)r:
long long a = (3265917058LL << 16) >> 24;
3265917058<<16 both sides are int, so the operation will be done in int (32-bits).
You need 3265917058LL<<16 then the left-side will be a long long and the operation will be done with that width i.e. 64-bits.
To get what you ask for:
long long ip=3265917058;
long long b= (static_cast<unsigned int>(ip)<<16)>> 24;
std::cout<<b<<std::endl; // 240
Note that the result you will get (240) is not portable. Mathematically, the result should be 12757488. The value 240 is due to truncation, and this is not guaranteed to happen. For instance, it doesn't happen on systems where int is 64 bits.