Calculating positive non-integer power of negative base - c++

To my knowledge
(-1)^1.8 = [(-1)^18]^0.1 = [1]^0.1 = 1
Hope I am not making a silly mistake.
std::pow(-1, 1.8) results in nan. Also, due to this link:
If base is finite and negative and exp is finite and non-integer, a domain error occurs and a range error may occur.
Is there a workaround to calculate the above operation with C++?

std::pow from <cmath> is for real numbers. The exponentiation (power) function of real numbers is not defined for negative bases.
Wikipedia says:
Real exponents with negative bases
Neither the logarithm method nor the rational exponent method can be used to define br as a real number for a negative real number b and an arbitrary real number r. Indeed, er is positive for every real number r, so ln(b) is not defined as a real number for b ≤ 0.
The rational exponent method cannot be used for negative values of b
because it relies on continuity. The function f(r) = br has a unique
continuous extension from the rational numbers to the real numbers
for each b > 0. But when b < 0, the function f is not even continuous
on the set of rational numbers r for which it is defined.
For example, consider b = −1. The nth root of −1 is −1 for every odd
natural number n. So if n is an odd positive integer, (−1)(m/n) = −1
if m is odd, and (−1)(m/n) = 1 if m is even. Thus the set of rational
numbers q for which (−1)q = 1 is dense in the rational numbers, as is
the set of q for which (−1)q = −1. This means that the function (−1)q
is not continuous at any rational number q where it is defined.
On the other hand, arbitrary complex powers of negative numbers b can
be defined by choosing a complex logarithm of b.
Powers of complex numbers
Complex powers of positive reals are defined via ex as in section
Complex exponents with positive real bases above [omitted from this quote]. These are continuous
functions.
Trying to extend these functions to the general case of noninteger
powers of complex numbers that are not positive reals leads to
difficulties. Either we define discontinuous functions or multivalued
functions. Neither of these options is entirely satisfactory.
The rational power of a complex number must be the solution to an
algebraic equation. Therefore, it always has a finite number of
possible values. For example, w = z1/2 must be a solution to the
equation w2 = z. But if w is a solution, then so is −w, because (−1)2
= 1. A unique but somewhat arbitrary solution called the principal value can be chosen using a general rule which also applies for
nonrational powers.
Complex powers and logarithms are more naturally handled as single
valued functions on a Riemann surface. Single valued versions are
defined by choosing a sheet. The value has a discontinuity along a
branch cut. Choosing one out of many solutions as the principal value
leaves us with functions that are not continuous, and the usual rules
for manipulating powers can lead us astray.
So, before calculating the result, you must first choose what you are calculating. The C++ standard library has in <complex> a function template std::complex<T> pow(const complex<T>& x, const T& y), which is specified to calculate (through definition of cpow in C standard):
The cpow functions compute the complex power function xy, with a branch cut for the first parameter along the negative real axis.
For (-1)1.8, the result would be e-(iπ)/5 ≈ 0.809017 + 0.587785i.
This is not what you expected as result. There is no exponentiation function in the C++ standard library that would calculate the result that you want.

Related

Do we need epsilon value for lesser or greater comparison for float value? [duplicate]

This question already has an answer here:
Floating point less-than-equal comparisons after addition and substraction
(1 answer)
Closed 9 months ago.
I have gone through different threads for comparing lesser or greater float value not equal comparison but not clear do we need epsilon value logic to compare lesser or greater float value as well?
e.g ->
float a, b;
if (a < b) // is this correct way to compare two float value or we need epsilon value for lesser comparator
{
}
if (a > b) // is this correct way to compare two float value for greater comparator
{
}
I know for comparing for equality of float, we need some epsilon value
bool AreSame(double a, double b)
{
return fabs(a - b) < EPSILON;
}
It really depends on what should happen when both value are close enough to be seen as equal, meaning fabs(a - b) < EPSILON. In some use cases (for example for computing statistics), it is not very important if the comparison between 2 close values gives or not equality.
If it matters, you should first determine the uncertainty of the values. It really depends on the use case (where the input values come from and how they are processed), and then 2 value differing by less than that uncertainty should be considered as equal. But that equality is not longer a true mathematical equivalence relation: you can easily imagine how to build a chain a close values between 2 truely different values. In math words, the relation is not transitive (or is almost transitive is current language words).
I am sorry but as soon as you have to process approximations there cannot be any precise and consistent way: you have to think of the real world use case to determine how you should handle the approximation.
When you are working with floats, it's inevitable that you will run into precision errors.
In order to mitigate this, when checking for the equality two floats we often check if their difference is small enough.
For lesser and greater, however, there is no way to tell with full certainty which float is larger. The best (presumably for your intentions) approach is to first check if the two floats are the same, using the areSame function. If so return false (as a = b implies that a < b and a > b are both false).
Otherwise, return the value of either a < b or a > b.
The answer is application dependent.
If you are sure that a and b are sufficiently different that numerical errors will not reverse the order, then a < b is good enough.
But if a and b are dangerously close, you might require a < b + EPSILON. In such a case, it should be clear to you that < and ≤ are not distinguishable.
Needless to say, EPSILON should be chosen with the greatest care (which is often pretty difficult).
It ultimately depends on your application, but I would say generally no.
The problem, very simplified, is that if you calculate: (1/3) * 3 and get the answer 0.999999, then you want that to compare equal to 1. This is why we use epsilon values for equal comparisons (and the epsilon should be chosen according to the application and expected precision).
On the other hand, if you want to sort a list of floats then by default the 0.999999 value will sort before 1. But then again what would the correct behavior be? If they both are sorted as 1, then it will be somewhat random which one is actually sorted first (depending on the initial order of the list and the sorting algorithm you use).
The problem with floating point numbers is not that they are "random" and that it is impossible to predict their exact values. The problem is that base-10 fractions don't translate cleanly into base-2 fractions, and that non-repeating decimals in one system can translate into repeating one in the other - which then result in rounding errors when truncated to a finite number of decimals. We use epsilon values for equal comparisons to handle rounding errors that arise from these back and forth translations.
But do be aware that the nice relations that ==, < and <= have for integers, don't always translate over to floating points exactly because of the epsilons involved. Example:
a = x
b = a + epsilon/2
c = b + epsilon/2
d = c + epsilon/2
Now: a == b, b == c, c == d, BUT a != d, a < d. In fact, you can continue the sequence keeping num(n) == num(n+1) and at the same time get an arbitrarily large difference between a and the last number in the sequence.
As others have stated, there would always be precision errors when dealing with floats.
Thus, you should have an epsilon value even for comparing less than / greater than.
We know that in order for a to be less than b, firstly, a must be different from b. Checking this is a simple NOT equals, which uses the epsilon.
Then, once you already know a != b, the operator < is sufficient.

Saving a complex number in a variable

I am trying to perform this computation which results in a complex number. However, C++ gives me "NaN".
double Q, r, Theta;
Q=-0.043543950754930;
r=0.009124131609174;
Theta=acos(r/sqrt(pow(-Q,3)));
// result must be (0.00000000000000 + 0.0911033580003565i)
Yes, by using the std::complex type:
#include <complex>
#include <iostream>
int main()
{
std::complex<double> Q = -0.043543950754930;
std::complex<double> r = 0.009124131609174;
std::complex<double> Theta = std::acos(r / std::sqrt(std::pow(-Q, 3)));
std::cout << Theta << '\n';
}
Note that the complex functions return values in specific ranges. You may have to adjust for this if you are looking for a specific answer.
I am trying to perform this computation which results in a complex number.
All the variables in the posted snippet are of type double, so that the compiler has to use the overloads of std::acos, std::sqrt and std::pow accepting parameters of type double and returning double values.
In particular, the function double std::acos(double arg)[1]:
If a domain error occurs, an implementation-defined value is returned (NaN where supported).
[...]
Domain error occurs if arg is outside the range [-1.0, 1.0].
Given the values of R and Q in the posted example, the value of arg is greater than 1, causing a domain error.
To obtain a complex value, the OP should use (or cast to) variables of type std::complex<double>, so that the "correct" overloads of the mathematical functions are chosen, as well as the expected return type.
They could also implement different numerical algorithms (one for real, one for complex values) and let the program choose the right path based upon the value of some "discriminant" variable. E.g. a cubic equation has three complex solutions in general, but those can either be three different real values or three real values (two coincident) or one real value and two complex conjugate ones. A program might use different methods instead of a single general (all complex) one.
[1] Quotes from https://en.cppreference.com/w/cpp/numeric/math/acos, emphasis mine.

Does floating-point use a key in hashtable?

I know that floating-point values cannot be compared by ==. I have made a custom comparison function like this.
auto isEqual = [](const double& a, const double& b) {
return fabs(a-b) <= numeric_limits<double>::epsilon();
};
I would like to know how I modify the unordered map to be worked as I expected.
auto isEqual = [](const double& a, const double& b) {
return fabs(a-b) <= numeric_limits<double>::epsilon();
};
unordered_map<double, int, hash<double>, decltype(isEqual)> m(0, hash<double>(), isEqual);
m[1/(double)3]++;
cout << m[1-2/(double)3] << endl; // expected 1, but zero
// -----------------------------
auto comp = [&](const double& a, const double& b) {
if (isEqual(a, b)) return false;
return a < b;
};
map<double, int, decltype(comp)> m2(comp);
m2[1/(double)3]++;
cout << m2[1-2/(double)3] << endl; // expected and answered 1
It is essentially impossible to modify an unordered_map to use floating-point values that contain different rounding errors as keys.
Customizing the equality comparison alone is insufficient because it is merely used to distinguish values that are mapped to the same bucket by the hash function. Different floating-point values generally hash to different buckets, even if they differ only by the tiniest of rounding errors. Therefore, one must also customize the hash function.
However, then the requirement for the hash function would be that it map floating-point values that are different but that you would like to consider as equal to the same bucket. In general, this is impossible because, if you want to consider any two very close numbers as equal, say numbers that are so close that they are adjacent in the floating-point format, then transitivity requires the hash function to map all numbers to one bucket. That is, since zero and the smallest positive representable number must map to the same bucket, and the smallest and second smallest positive representable numbers must map to the same bucket, then zero and the second smallest positive representable number must map to the same bucket. Similarly, the third smallest number must represent to the same bucket as the second smallest and therefore to the same bucket as zero. And so on for the fourth and fifth. This creates a chain that continues for all numbers: They must all map to the same bucket.
Therefore, no hash function can serve to implement a non-degenerate map for floating-point numbers that considers close numbers as equal.
In special situations, it is possible to implement a reasonable hash for certain sets of numbers. For example, if it is known that all the floating-point values represent a number of cents, and that the floating-point numbers never contain accumulated rounding errors that reach or exceed half a cent, then each value can be rounded to the nearest cent (or the representable value nearest that) before hashing. Note that the domain of values in this case is really a discrete set, such as a set of fixed-point numbers, not a continuous set such as the set of real numbers that floating-point arithmetic is intended to approximate. In this case, the only modification that is needed is to quantize the floating-point value (round it to the nearest member of the set) before inserting it into the map. No custom hash function or equality comparison is needed.

Given (a, b) compute the maximum value of k such that a^{1/k} and b^{1/k} are whole numbers

I'm writing a program that tries to find the minimum value of k > 1 such that the kth root of a and b (which are both given) equals a whole number.
Here's a snippet of my code, which I've commented for clarification.
int main()
{
// Declare the variables a and b.
double a;
double b;
// Read in variables a and b.
while (cin >> a >> b) {
int k = 2;
// We require the kth root of a and b to both be whole numbers.
// "while a^{1/k} and b^{1/k} are not both whole numbers..."
while ((fmod(pow(a, 1.0/k), 1) != 1.0) || (fmod(pow(b, 1.0/k), 1) != 0)) {
k++;
}
Pretty much, I read in (a, b), and I start from k = 2 and increment k until the kth roots of a and b are both congruent to 0 mod 1 (meaning that they are divisible by 1 and thus whole numbers).
But, the loop runs infinitely. I've tried researching, and I think it might have to do with precision error; however, I'm not too sure.
Another approach I've tried is changing the loop condition to check whether the floor of a^{1/k} equals a^{1/k} itself. But again, this runs infinitely, likely due to precision error.
Does anyone know how I can fix this issue?
EDIT: for example, when (a, b) = (216, 125), I want to have k = 3 because 216^(1/3) and 125^(1/3) are both integers (namely, 5 and 6).
That is not a programming problem but a mathematical one:
if a is a real, and k a positive integer, and if a^(1./k) is an integer, then a is an integer. (otherwise the aim is to toy with approximation error)
So the fastest approach may be to first check if a and b are integer, then do a prime decomposition such that a=p0e0 * p1e1 * ..., where pi are distinct primes.
Notice that, for a1/k to be an integer, each ei must also be divisible by k. In other words, k must be a common divisor of the ei. The same must be true for the prime powers of b if b1/k is to be an integer.
Thus the largest k is the greatest common divisor of all ei of both a and b.
With your approach you will have problem with large numbers. All IIEEE 754 binary64 floating points (the case of double on x86) have 53 significant bits. That means that all double larger than 253 are integer.
The function pow(x,1./k) will result in the same value for two different x, so that with your approach you will necessary have false answer, for example the numbers 55*290 and 35*2120 are exactly representable with double. The result of the algorithm is k=5. You may find this value of k with these number but you will also find k=5 for 55*290-249 and 35*2120, because pow(55*290-249,1./5)==pow(55*290). Demo here
On the other hand, as there are only 53 significant bits, prime number decomposition of double is trivial.
Floating numbers are not mathematical real numbers. The computation is "approximate". See http://floating-point-gui.de/
You could replace the test fmod(pow(a, 1.0/k), 1) != 1.0 with something like fabs(fmod(pow(a, 1.0/k), 1) - 1.0) > 0.0000001 (and play with various such 𝛆 instead of 0.0000001; see also std::numeric_limits::epsilon but use it carefully, since pow might give some error in its computations, and 1.0/k also inject imprecisions - details are very complex, dive into IEEE754 specifications).
Of course, you could (and probably should) define your bool almost_equal(double x, double y) function (and use it instead of ==, and use its negation instead of !=).
As a rule of thumb, never test floating numbers for equality (i.e. ==), but consider instead some small enough distance between them; that is, replace a test like x == y (respectively x != y) with something like fabs(x-y) < EPSILON (respectively fabs(x-y) > EPSILON) where EPSILON is a small positive number, hence testing for a small L1 distance (for equality, and a large enough distance for inequality).
And avoid floating point in integer problems.
Actually, predicting or estimating floating point accuracy is very difficult. You might want to consider tools like CADNA. My colleague Franck Védrine is an expert on static program analyzers to estimate numerical errors (see e.g. his TERATEC 2017 presentation on Fluctuat). It is a difficult research topic, see also D.Monniaux's paper the pitfalls of verifying floating-point computations etc.
And floating point errors did in some cases cost human lives (or loss of billions of dollars). Search the web for details. There are some cases where all the digits of a computed number are wrong (because the errors may accumulate, and the final result was obtained by combining thousands of operations)! There is some indirect relationship with chaos theory, because many programs might have some numerical instability.
As others have mentioned, comparing floating point values for equality is problematic. If you find a way to work directly with integers, you can avoid this problem. One way to do so is to raise integers to the k power instead of taking the kth root. The details are left as an exercise for the reader.

std::uniform_real_distribution - get all possible numbers

I would like to create a std::uniform_real_distribution able to generate a random number in the range [MIN_FLOAT, MAX_FLOAT]. Following is my code:
#include <random>
#include <limits>
using namespace std;
int main()
{
const auto a = numeric_limits<float>::lowest();
const auto b = numeric_limits<float>::max();
uniform_real_distribution<float> dist(a, b);
return 0;
}
The problem is that when I execute the program, it is aborted because a and b seem to be invalid arguments. How should I fix it?
uniform_real_distribution's constructor requires:
a ≤ b and b − a ≤ numeric_limits<RealType>::max().
That last one is not possible for you, since the difference between lowest and max, by definition, must be larger than max (and will almost certainly be INF).
There are several ways to resolve this. The simplest, as Nathan pointed out, is to just use a uniform_real_distribution<double>. Unless double for your implementation couldn't store the range of a float (and IEEE-754 Float64's can store the range of Float32's), this ought to work. You would still be passing the numeric_limits for a float, but since the distribution uses double, it can handle the math for the increased range.
Alternatively, you could combine a uniform_real_distribution<float> with a boolean uniform_int_distribution (that is, one that selects between 0 and 1). Your real distribution should be over the positive numbers, up to max. Every time you get a number from the real distribution, get one from the int distribution too. If the integer is 1, then negate the real value.
This has the downside of making the probability of zero slightly higher than the probability of other numbers, since positive and negative zero are the same thing.