C++ checking floats - c++

Right, I have an array for floats which stores only 1s and 0s. I'm trying to just do a simply test/check that the current slot in the array is 1 it will print out a little message to say it is 1, otherwise, it is 0. Heres my code:
if(myArray[i] == 1)
{
cout << "this is 1 !!!!!" << endl;
}
else
{
cout << "this is 0 ";
}
but this just keeps entering the "else" section. i.e. only printing "this is 0". Whats wrong with it (or whats wrong with me?? :P)??

Great link: What Every Computer Scientist Should Know About Floating-Point Arithmetic
After reading that you'll realise that the floating-point representation of 1 isn't quite the integer value 1. It's close, but not quite, and that's why your condition will always be false.
Why would you use floats to store boolean data? Use an array of bools or a bitvector.
EDIT: I can't actually think of any situation where (or why) you'd compare floats to a literal, anyone know any?

When operating with floating point numbers you should program a little more defensive:
if (myArray[i] == 1) {
cout << "this is 1\n";
} else if (myArray[i] == 0) {
cout << "this is 0\n";
} else {
cout << "this is something else, in particular " << myArray[i] << "\n";
}
This should give you an insight about what happens.
By the way, if you only ever store the values 1.0f and 0.0f in the array, it is perfectly ok to use the == operator to compare floats. You just have to be sure that what you think is 1.0 is really really really 1.0.

First, don't use floats to store a 1 or a 0. There is no reason to use a float to store small integers.
Second, you need to read into What Every Computer Scientist Should Know About Floating Point Numbers. Though it will often work, you should compare floating point values by taking the absolute value of the difference between them and comparing that with some small sigma (where sigma is a value that makes sense in your application within the valid range of precision).
If abs( x - y ) < sigma you can consider them equal.

Depending on how you arrived at the values in your array (i.e. the result of computations), it's highly unlikely that you'll get an exact 0 or 1 as a floating point result. Checking if a float == 1 exactly will almost certainly be false in that case.
On the other hand, in IEEE floating point, an exact 0 is stored as 0x00000000. If it's not the result of a computation, sticking a 0 in your array can be useful as a flag instead of storing a separate array.

There could be very good reason that the value in your floating point array is off by very slight fraction to be equal to 1.
Though this is not a clean solution, this should be reason enough to use either enum or a bool array.
if(myArray[i] > 0.9f && myArray[i] < 1.00001f)
...
...

I've answered something about this
here. You can't do a "==" to compare float and int.
In your case, maybe the float 1 is actually 0.9999999 which will be truncated to 0, which is the output of your program is already correct.

Either cast the array element to an int:
if ((int)myArray[i] == 1)
Or cast the 1 to a float:
if (myArray[i] == 1.0f)

Related

Float to Double adding many 0s at the end of the new double number during conversion

I'm facing a little problem on a personal project:
When I'm converting a float number to a double to make operations (+-*/) easy, it adds a lot of 0s behind the default float number.
For example: float number = -4.1112 -> double number = -4.1112000000000002
I convert the float to a double with the standard function std::stod().
This issue is a big problem for me cause I'm checking for overflow in my project and it throws an exception because of this issue.
Here is the checkOverflow function that throws an exception:
{
if (type == eOperandType::Int8) {
if (value > std::numeric_limits<int8_t>::max() || value < std::numeric_limits<int8_t>::min())
throw VMException("Overflow");
} else if (type == eOperandType::Int16) {
if (value > std::numeric_limits<int16_t>::max() || value < std::numeric_limits<int16_t>::min())
throw VMException("Overflow");
} else if (type == eOperandType::Int32) {
if (value > std::numeric_limits<int32_t>::max() || value < std::numeric_limits<int32_t>::min())
throw VMException("Overflow");
} else if (type == eOperandType::Float) {
if (value > std::numeric_limits<float>::max() || value < std::numeric_limits<float>::min())
throw VMException("Overflow");
} else if (type == eOperandType::Double) {
if (value > std::numeric_limits<double>::max() || value < std::numeric_limits<double>::min())
throw VMException("Overflow");
}
}
The problem you are having is completely different.
All your checks are wrong. Think about it: if a variable is of type, say, int32_t, its value is necessarily between the minimum and maximum possible values that can be represented by an int32_t, by definition. Let's simplify: it's like having a single-digit number, and testing that it is between 0 and 9 (if it is unsigned), or between -9 and +9 (if it is signed): how could such a test fail? Your checks should never raise an exception. But, as you say, they do. How is it even possible? And anyway, why would it happen for the long series of zeros that derive from representing -4.1112 as a floating point number, turning it into -4.1112000000000002? That isn't an overflow! This is a strong hint that your problem is elsewhere.
The solution is that std::numeric_limits<T>::min doesn't do what you think. As CPP Reference explains, it gives you the smallest positive value:
For floating-point types with denormalization, min returns the minimum positive normalized value. Note that this behavior may be unexpected, especially when compared to the behavior of min for integral types. To find the value that has no values less than it, use numeric_limits::lowest.
And the page about lowest also provides an example, comparing the output of min, lowest and max:
std::numeric_limits<T>::min():
float: 1.17549e-38 or 0x1p-126
double: 2.22507e-308 or 0x1p-1022
std::numeric_limits<T>::lowest():
float: -3.40282e+38 or -0x1.fffffep+127
double: -1.79769e+308 or -0x1.fffffffffffffp+1023
std::numeric_limits<T>::max():
float: 3.40282e+38 or 0x1.fffffep+127
double: 1.79769e+308 or 0x1.fffffffffffffp+1023
And as you can see, min is positive. So the opposite of max is lowest.
So you are getting exceptions because your negative values are smaller than the smallest positive value. Or, in other words: because -4 is less than 0.0001. Which is correct. It's the test that is wrong!
You could fix that by using lowest... But then, what would your checks tell you? If they ever raised an exception, it would mean that the compiler and/or library that you are using are seriously broken. If that is what you are testing, ok. But honestly I think it will never happen, and you could just delete these tests, as they provide no real value.
That's life I'm afraid. A floating point type can only represent a sparse subset of the real numbers.
Assuming IEEE754, the nearest float to -4.1112 is -4.111199855804443359375
The nearest double to -4.1112 is -4.111200000000000187583282240666449069976806640625
If you need perfect decimal precision then use a decimal type. There's one in the Boost library distribution. Boost also has a numeric_cast function, that does what your function is attempting to do; cleverly, with minimal run-time overhead.

How to convert binary to decimal? (In a very simple way)

I need to write a program that converts binary numbers to decimal.
I'm very new to C++ so I've tried looking at other people's examples but they are way too advanced for me.. I thought I had a clever idea on how to do it but I'm not sure if my idea was way off or if I'm just missing something.
int main(void)
{
//variables
string binary;
int pow2 = binary.length() - 1;
int length = binary.length();
int index = 0;
int decimal = 0;
cout << "Enter a binary number ";
cin >> binary; cout << endl;
while(index < length)
{
if (binary.substr(index, 1) == "1")
{
decimal = decimal + pow(2, pow2);
}
index++;
pow2--;
}
cout << binary << " converted to decimal is " << decimal;
}
Your computer is a logical beast. Your computer executes your program, one line at a time. From start to finish. So, let's take a trip, together, with your computer, and see what it ends up doing, starting at the very beginning of your main:
string binary;
Your computer begins by creating a new std::string object. Which is, of course, empty. There's nothing in it.
int pow2 = binary.length() - 1;
This is the very second thing that your computer does.
And because we've just discovered that binary is empty, binary.length() is obviously 0, so this sets pow2 to -1. When you take 0, and subtract 1 from it, that's what you get.
int length = binary.length();
Since binary is still empty, its length() is still 0, and this simply creates a new variable called length, whose value is 0.
int index = 0;
int decimal = 0;
This creates a bunch more variables, and sets them to 0. That's the next thing your computer does.
cout << "Enter a binary number ";
cin >> binary; cout << endl;
Here you print some stuff, and read some stuff.
while(index < length)
Now, we get into the thick of things. So, let's see what your computer did, before you got to this point. It set index to 0, and length to 0 also. So, both of these variables are 0, and, therefore, this condition is false, 0 is not less than 0. So nothing inside the while loop ever executes. We can skip to the end of your program, after the while loop.
cout << binary << " converted to decimal is " << decimal;
And that's how you your computer always gives you the wrong result. Your actual question was:
sure if my idea was way off or if I'm just missing something.
Well, there also other problems with your idea, too. It was slightly off. For starters, nothing here really requires the use of the pow function. Using pow is like try to kill a fly with a hammer. What pow does is: it converts integer values to floating point, computes the natural logarithm of the first number, multiplies it by the second number, and then raises e to this power, and then your code (which never runs) finally converts the result from floating point to integer, rounding things off. Nothing of this sort is ever needed in order to simply convert binary to decimal. This never requires employing the services of natural logarithms. This is not what pow is for.
This task can be easily accomplished with just multiplication and addition. For example, if you already have the number 3, and your next digit is 7, you end up with 37 by multiplying 3 by 10 and then adding 7. You do the same exact thing with binary, base 2, with the only difference being that you multiply your number by 2, instead of 10.
But what you're really missing the most, is the Golden Rule Of Computer Programming:
A computer always does exactly what you tell it to do, instead of what you want it to do.
You need to tell your computer exactly what your computer needs to do. One step at a time. And in the right order. Telling your computer to compute the length of the string before it's even read from std::cin does not accomplish anything useful. It does not automatically recompute its length, after it's actually read. Therefore, if you need to compute the length of an entered string, you computer it after it's been read in, not before. And so on.

How to avoid printing -0.00 in c++ when using iomanip [duplicate]

As in this question is said, there is some differences between negative and positive zero in floating point numbers. I know it's because of some important reasons. what I want to know is a short code to avoid negative zero in output.
for example in the following code:
cout << fixed << setprecision(3);
cout << (-0.0001) << endl;
"-0.000" is printed. but I want "0.000".
Note all other negative numbers (e.g. -0.001) should still be printed with the minus sign preceding them, so simply * -1 will not work.
Try depending on your precision.
cout << ((abs(ans) < 0.0005)? 0.000: ans) << endl;
How about:
cout << (value == 0.0 ? abs(value) : value) << endl;
If you care about arbitrary precision, as opposed to just a fixed one at 3, you'll need a small bit of work. Basically, you'll have to do a pre-check before the cout to see if the number will get formatted in a way you don't like.
You need to find the order of magnitude of the number to see if it the imprecise digits will be lost, leaving only the sign bit.
You can do this using the base 10 logarithm of the absolute value of the number. If negative of result is greater than the precision you have set, the number will show in a way you don't want.
log10 of 0.0001 is -4.
negative of (-4) is 4.
4 > 3 (the arbitrary precision)
Thus the value will show up unhappily.
In very bad pseudocode:
float iHateNegativeZeros(float theFloat, int precision)
{
if((theFloat < 0.0f) &&
(-log10(abs(theFloat)) > precision))
{
return -theFloat;
}
else
{
return theFloat;
}
}

Check if a passed double argument is "close enough" to be considered integral

I am working on a code where I need to check if a certain variable that can take a double value has actually taken on an integer value. I consider a double variable to have taken on an integer value if it is within a tolerance of an integer. This tolerance is 1e-5.
The following is my code:
#define SMALL 1e-5
//Double that attains this is considered non zero. Strictly Less than this is 0
int check_if_integer(double arg){
//returns 1 if arg is close enough to an integer
//returns 0 otherwise
if(arg - (int)arg >= SMALL){
if(arg + SMALL > (int)(arg+1.0)){
return(1);
//Code should have reached this point since
//arg + SMALL is 16.00001
//while (int)(arg+1.0) should be 16
//But the code seems to evaluate (int)(arg+1.0) to be 17
}
}
else{
return(1);
}
return(0);
}
int main(void){
int a = check_if_integer(15.999999999999998);
}
Unfortunately, on passing the argument 15.999999999999998, the function returns a 0. That is, it deems the argument to be fractional, while it should have returned a 1 indicating that the argument is "close enough" to 16.
I am using VS2010 professional.
Any pointers will be greatly appreciated!
Further to hvd's answer regarding types; it is also inadvisable to add/subtract small doubles to/from large doubles due to the way in which they are internally represented.
A simple work around which avoids both issues would be:
if (abs(arg - round(arg)) <= SMALL) {
return (1);
} else {
return (0);
}
Yes, floating point is hard. Just because 15.999999999999998 < 16.0, that doesn't mean 15.999999999999998 + 1.0 < 17.0. Suppose you have a decimal floating-point type with three digits of precision. What result do you get for 9.99 + 1.0 in that type's precision? The mathematical result would be 10.99, and rounded to that type's precision gives 11.0. Binary floating-point has the same issue.
You can, in this particular case, change (int)(arg+1.0) to (int)arg+1. (int)arg is accurate, and so is integer addition.

My for-loop does not start [duplicate]

This question already has answers here:
Compare double to zero using epsilon
(12 answers)
Closed 8 years ago.
I know there are loads of topics about this question, but none of those helped me. I am trying to find the root of a function by testing every number in a range of -10 to 10 with two decimal places. I know it maybe isn't the best way, but I am a beginner and just want to try this out. Somehow the loop does not work, as I am always getting -10 as an output.
Anyway, that is my code:
#include <iostream>
using namespace std;
double calc (double m,double n)
{
double x;
for (x=-10;x<10 && m*x+n==0; x+=0.01)
{
cout << x << endl;
}
return x;
}
int main()
{
double m, n, x;
cout << "......\n";
cin >> m; // gradient
cout << "........\n";
cin >> n; // y-intercept
x=calc(m,n); // using function to calculate
cout << ".......... " << x<< endl; //output solution
cout << "..............\n"; // Nothing of importance
return 0;
}
You are testing the conjunction of two conditions in your loop condition.
for (x=-10;x<10 && m*x+n==0; x+=0.01
For many inputs, the second condition will not be true, so the loop will terminate before the first iteration, causing a return value of -10.
What you want is probably closer to something closer to the following. We need to test whether the absolute value is smaller than some EPSILON for two reasons. One, double is not precise. Two, you are doing an approximate solution anyways, so you would not expect an exact answer unless you happened to get lucky.
#define EPSILON 1E-2
double calc (double m,double n)
{
double x;
for (x=-10;x<10; x+=0.001)
{
if (abs(m*x+n) < EPSILON) return x;
}
// return a value outside the range to indicate that we failed to find a
// solution within range.
return -20;
}
Update: At the request of the OP, I will be more specific about what problem EPSILON solves.
double is not precise. In a computer, floating point number are usually represented by a fixed number of bits, with the bit representation usually being specified by a standard such as IEE 754. Because the number of bits is fixed and finite, you cannot represent arbitrary precision numbers. Let us consider an example in base 10 for ease of understanding, although you should understand that computers experience a similar problem in base 2.
If m = 1/3, x = 3, and n = -1, we would expect that m*x + n == 0. However, because 1/3 is the repeated decimal 0.33333... and we can only represent a fixed number of them, the result of 3*0.33333 is actually 0.999999, which is not equal to 1. Therefore, m*x + n != 0, and our check will fail. Thus, instead of checking for equality with zero, we must check whether the result is sufficiently close to zero, by comparing its absolute value with a small number we call EPSILON. As one of the comments pointed out the correct value of EPSILON for this particular purpose is std::numeric_limits::epsilon, but the second issue requires a larger EPSILON.
You are are only doing an approximate solution anyways. Since you are checking the values of x at finitely small increments, there is a strong possibility that you will simply step over the root without ever landing on it exactly. Consider the equation 10000x + 1 = 0. The correct solution is -0.0001, but if you are taking steps of 0.001, you will never actually try the value x = -0.0001, so you could not possibly find the correct solution. For linear functions, we would expect that values of x close to -0.0001, such as x = 0, will get us reasonably close to the correct solution, so we use EPSILON as a fudge factor to work around the lack of precision in our method.
m*x+n==0 condition returns false, thus the loop doesn't start.
You should change it to m*x+n!=0