Precision in computations c++ Cplex

Precision in computations c++ Cplex - c++

I implemented my mathematical model using Ilog Cplex with c++. Most of my decision variables have fractional values in the optimal solutions. Some of them are very small that cplex outputs them as 0. Is there a way to increase the precision so that I can still see the values of such variables?
Also, when I use cplex.getBestObjValue(), it gives me "-Inf". (This is a maximization problem.)

Having values for integer variables that are close to (but not exactly) integer values is quite normal. CPLEX has an integrality tolerance so that these values are accepted as close enough to the correct integer values. Just use standard C++ output functions to output these values to whatever precision you want.
Mostly this is not a problem, but you can set the integrality tolerance to a smaller value if necessary. I normally round these values to the nearest integer value and use that as my solution. You can also try re-solving your model with those decision variables fixed to their rounded integer values to be sure the solution really is valid. If you are not sure that is sufficient, try Alex's suggestion for numerical precision emphasis too.

you could try to use the setting IloCplex::Param::Emphasis::Numerical
Emphasizes precision in numerically unstable or difficult problems. This parameter lets you specify to CPLEX that it should emphasize precision in numerically difficult or unstable problems, with consequent performance trade-offs in time and memory.
About your second question, is your model a LP ?
regards

Related

Printing bits as IEEE-754 float

Is there some clever and reliable way to print series of bits as an IEEE-754 without actually using a float type?
I have found a way to print fractions, which allows me to represent the float as a a fraction. However, I then came to realize that the exponent may range from -127 to 128 (after adjusting with bias), which may result in the multiplication mantissa * 2^128. The fraction method relies on representing the numerator as an integer, and I would require a really large integer to do this multiplication. I mean, I could use "custom" type to represent this large value (i.e. https://gmplib.org/), but I would prefer if to avoid this. If we multiplied by 10^x, I could simply adjust the decimal point and add some zeros, but sadly that's not the case either.
I have not been able to find anything that mentions any solution for this. Probably due to the fact that googling stuff like "print from
Why am I actually trying to do this?
I'm only doing this to get a better understanding of how floats (IEEE-754 in particular) work, and I find that it always help to do some practical example. So I thought "Hey, why not try to code it?". This has no practical application (that I know of)!

So, almost immediately after posting this, I finally succeded in finding the resources I've been looking for.
https://www.ryanjuckett.com/printing-floating-point-numbers/ talks about it, and references other relevant sources.

How to detect floating point representation errors ...99999... or ...00000... and correct them in C++

I am using Firebird 1.5.x database and it has problems with variables that are stored in double precision or numeric(15,2) fields. E.g. I can issue update (field_1 and field_2 are declared as numeric(15,2)):
update test_table set
field_1=0.34,
field_2=0.69;
But when field_1 and field_2 is read into variables var_1 and var_2 inside the SQL stored procedure, then var_1 and var_2 assume values 0.340000000000000024 and 0.689999999999999947 respectively. The multiplication var_1*var_2*25 gives 5.8649999999999999635 which can be rounded as 5.86. This ir wrong apparently, because the correct final value is 5.87.
The rounding is done with user defined function which comes from C++ DLL which I develop. The idea is to detect situations with represenation error, make correction and apply the rounding procedure to the corrected values only.
So, the question is - how to detect and correct represenation errors. E.g.
detect that 0.689999999999999947 should be corrected to 0.69 and detect, that 0.340000000000000024 should be corrected to 0.34. Generally there can be situation when the number of significant numbers after point is more or less than 2, e.g. 0.23459999999999999854 should be corrected to 0.2346.
Is if possible to do in C++ and maybe some solutions already exist for this?
p.s. I tested this case in more recent Firebird versions 2.x and there is no problem with reading database fields into variables in stored procedure. But I can guess that nevertheless the representation errors can arise in more recent Firebird versions too during some lengthy calculations.
Thanks!

This problem exists in all programming languages where floating point variables are used.
You need to understand that there is no accurate way to display 1/3. Any display of that value is a compromise that must be agreed to between the concerned parties. The required precision is what must be agreed to. If you understand that, we can move on.
So how do we (for example) develop accounting systems with accuracy? One approach is to round(column_name, required_precision) all values as they are used in calculations. Also round the result of any division to the same precision. It is important to constently use the agreed-to precision throughout the application.
Another alternative is to always multiply the floating-point values by 100 (if your values represent currency) and assign the resulting values to Integer variables. Again, the result of a division must be rounded to 0 decimal places before it is assigned to it's integer storage. Values are then divided by 100 for display purposes. This is as good as it gets.

`std::sin` is wrong in the last bit

I am porting some program from Matlab to C++ for efficiency. It is important for the output of both programs to be exactly the same (**).
I am facing different results for this operation:
std::sin(0.497418836818383950) = 0.477158760259608410 (C++)
sin(0.497418836818383950) = 0.47715876025960846000 (Matlab)
N[Sin[0.497418836818383950], 20] = 0.477158760259608433 (Mathematica)
So, as far as I know both C++ and Matlab are using IEEE754 defined double arithmetic. I think I have read somewhere that IEEE754 allows differents results in the last bit. Using mathematica to decide, seems like C++ is more close to the result. How can I force Matlab to compute the sin with precision to the last bit included, so that the results are the same?
In my program this behaviour leads to big errors because the numerical differential equation solver keeps increasing this error in the last bit. However I am not sure that C++ ported version is correct. I am guessing that even if the IEEE754 allows the last bit to be different, somehow guarantees that this error does not get bigger when using the result in more IEEE754 defined double operations (because otherwise, two different programs correct according to the IEEE754 standard could produce completely different outputs). So the other question is Am I right about this?
I would like get an answer to both bolded questions. Edit: The first question is being quite controversial, but is the less important, can someone comment about the second one?
Note: This is not an error in the printing, just in case you want to check, this is how I obtained these results:
http://i.imgur.com/cy5ToYy.png
Note (**): What I mean by this is that the final output, which are the results of some calculations showing some real numbers with 4 decimal places, need to be exactly the same. The error I talk about in the question gets bigger (because of more operations, each of one is different in Matlab and in C++) so the final differences are huge) (If you are curious enough to see how the difference start getting bigger, here is the full output [link soon], but this has nothing to do with the question)

Firstly, if your numerical method depends on the accuracy of sin to the last bit, then you probably need to use an arbitrary precision library, such as MPFR.
The IEEE754 2008 standard doesn't require that the functions be correctly rounded (it does "recommend" it though). Some C libms do provide correctly rounded trigonometric functions: I believe that the glibc libm does (typically used on most linux distributions), as does CRlibm. Most other modern libms will provide trig functions that are within 1 ulp (i.e. one of the two floating point values either side of the true value), often termed faithfully rounded, which is much quicker to compute.
None of those values you printed could actually arise as IEEE 64bit floating point values (even if rounded): the 3 nearest (printed to full precision) are:
0.477158760259608 405451814405751065351068973541259765625
0.477158760259608 46096296563700889237225055694580078125
0.477158760259608 516474116868266719393432140350341796875
The possible values you could want are:
The exact sin of the decimal .497418836818383950, which is
0.477158760259608 433132061388630377105954125778369485736356219...
(this appears to be what Mathematica gives).
The exact sin of the 64-bit float nearest .497418836818383950:
0.477158760259608 430531153841011107415427334794384396325832953...
In both cases, the first of the above list is the nearest (though only barely in the case of 1).

The sine of the double constant you wrote is about 0x1.e89c4e59427b173a8753edbcb95p-2, whose nearest double is 0x1.e89c4e59427b1p-2. To 20 decimal places, the two closest doubles are 0.47715876025960840545 and 0.47715876025960846096.
Perhaps Matlab is displaying a truncated value? (EDIT: I now see that the fourth-last digit is a 6, not a 0. Matlab is giving you a result that's still faithfully-rounded, but it's the farther of the two closest doubles to the desired result. And it's still printing out the wrong number.
I should also point out that Mathematica is probably trying to solve a different problem---compute the sine of the decimal number 0.497418836818383950 to 20 decimal places. You should not expect this to match either the C++ code's result or Matlab's result.

How can I get consistent program behavior when using floats?

I am writing a simulation program that proceeds in discrete steps. The simulation consists of many nodes, each of which has a floating-point value associated with it that is re-calculated on every step. The result can be positive, negative or zero.
In the case where the result is zero or less something happens. So far this seems straightforward - I can just do something like this for each node:
if (value <= 0.0f) something_happens();
A problem has arisen, however, after some recent changes I made to the program in which I re-arranged the order in which certain calculations are done. In a perfect world the values would still come out the same after this re-arrangement, but because of the imprecision of floating point representation they come out very slightly different. Since the calculations for each step depend on the results of the previous step, these slight variations in the results can accumulate into larger variations as the simulation proceeds.
Here's a simple example program that demonstrates the phenomena I'm describing:
float f1 = 0.000001f, f2 = 0.000002f;
f1 += 0.000004f; // This part happens first here
f1 += (f2 * 0.000003f);
printf("%.16f\n", f1);
f1 = 0.000001f, f2 = 0.000002f;
f1 += (f2 * 0.000003f);
f1 += 0.000004f; // This time this happens second
printf("%.16f\n", f1);
The output of this program is
0.0000050000057854
0.0000050000062402
even though addition is commutative so both results should be the same. Note: I understand perfectly well why this is happening - that's not the issue. The problem is that these variations can mean that sometimes a value that used to come out negative on step N, triggering something_happens(), now may come out negative a step or two earlier or later, which can lead to very different overall simulation results because something_happens() has a large effect.
What I want to know is whether there is a good way to decide when something_happens() should be triggered that is not going to be affected by the tiny variations in calculation results that result from re-ordering operations so that the behavior of newer versions of my program will be consistent with the older versions.
The only solution I've so far been able to think of is to use some value epsilon like this:
if (value < epsilon) something_happens();
but because the tiny variations in the results accumulate over time I need to make epsilon quite large (relatively speaking) to ensure that the variations don't result in something_happens() being triggered on a different step. Is there a better way?
I've read this excellent article on floating point comparison, but I don't see how any of the comparison methods described could help me in this situation.
Note: Using integer values instead is not an option.
Edit the possibility of using doubles instead of floats has been raised. This wouldn't solve my problem since the variations would still be there, they'd just be of a smaller magnitude.

I've worked with simulation models for 2 years and the epsilon approach is the sanest way to compare your floats.

Generally, using suitable epsilon values is the way to go if you need to use floating point numbers. Here are a few things which may help:
If your values are in a known range you and you don't need divisions you may be able to scale the problem and use exact operations on integers. In general, the conditions don't apply.
A variation is to use rational numbers to do exact computations. This still has restrictions on the operations available and it typically has severe performance implications: you trade performance for accuracy.
The rounding mode can be changed. This can be use to compute an interval rather than an individual value (possibly with 3 values resulting from round up, round down, and round closest). Again, it won't work for everything but you may get an error estimate out of this.
Keeping track of the value and a number of operations (possible multiple counters) may also be used to estimate the current size of the error.
To possibly experiment with different numeric representations (float, double, interval, etc.) you might want to implement your simulation as templates parameterized for the numeric type.
There are many books written on estimating and minimizing errors when using floating point arithmetic. This is the topic of numerical mathematics.
Most cases I'm aware of experiment briefly with some of the methods mentioned above and conclude that the model is imprecise anyway and don't bother with the effort. Also, doing something else than using float may yield better result but is just too slow, even using double due to the doubled memory footprint and the smaller opportunity of using SIMD operations.

I recommend that you single step - preferably in assembly mode - through the calculations while doing the same arithmetic on a calculator. You should be able to determine which calculation orderings yield results of lesser quality than you expect and which that work. You will learn from this and probably write better-ordered calculations in the future.
In the end - given the examples of numbers you use - you will probably need to accept the fact that you won't be able to do equality comparisons.
As to the epsilon approach you usually need one epsilon for every possible exponent. For the single-precision floating point format you would need 256 single precision floating point values as the exponent is 8 bits wide. Some exponents will be the result of exceptions but for simplicity it is better to have a 256 member vector than to do a lot of testing as well.
One way to do this could be to determine your base epsilon in the case where the exponent is 0 i e the value to be compared against is in the range 1.0 <= x < 2.0. Preferably the epsilon should be chosen to be base 2 adapted i e a value that can be exactly represented in a single precision floating point format - that way you know exactly what you are testing against and won't have to think about rounding problems in the epsilon as well. For exponent -1 you would use your base epsilon divided by two, for -2 divided by 4 and so on. As you approach the lowest and the highest parts of the exponent range you gradually run out of precision - bit by bit - so you need to be aware that extreme values can cause the epsilon method to fail.

If it absolutely has to be floats then using an epsilon value may help but may not eliminate all problems. I would recommend using doubles for the spots in the code you know for sure will have variation.
Another way is to use floats to emulate doubles, there are many techniques out there and the most basic one is to use 2 floats and do a little bit of math to save most of the number in one float and the remainder in the other (saw a great guide on this, if I find it I'll link it).

Certainly you should be using doubles instead of floats. This will probably reduce the number of flipped nodes significantly.
Generally, using an epsilon threshold is only useful when you are comparing two floating-point number for equality, not when you are comparing them to see which is bigger. So (for most models, at least) using epsilon won't gain you anything at all -- it will just change the set of flipped nodes, it wont make that set smaller. If your model itself is chaotic, then it's chaotic.

Why would I use 2's complement to compare two doubles instead of comparing their differences against an epsilon value?

Referenced here and here...Why would I use two's complement over an epsilon method? It seems like the epsilon method would be good enough for most cases.
Update: I'm purely looking for a theoretical reason why you'd use one over the other. I've always used the epsilon method.
Has anyone used the 2's complement comparison successfully? Why? Why Not?

the second link you reference mentions an article that has quite a long description of the issue:
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
but unless you are tweaking performance I would stick with epsilon so people can debug your code

The bits method might be faster. I say might because on modern (multicore, highly pipelined) processors it is often impossible to guess what is really faster.
Code the simplest most obviously correct implementation, then measure, then optomise.

In short, when comparing two floats with unknown origins, picking an epsilon that is valid is almost impossible.
For example:
What is a good epsilon when comparing distance in miles between Atlanta GA, Dallas TX and some place in Ohio?
What is a good epsilon when comparing distance in miles between my left foot, my right foot and the computer under my desk?
EDIT:
Ok, I'm getting a fair number of people not understanding why you wouldn't know what your epsilon is.
Back in the old days of lore, I wrote two programs that worked with NeverWinter Nights (a game made by BioWare). One of the programs took a binary model and converted it to ASCII. The other program took an ASCII model and compiled it into binary. One of the tests I wrote was to take all of BioWare's binary models, decompile them to ASCII and then back to binary. Then I compared my binary version with original one from BioWare. One of the problems during the comparison was dealing with some of the slight variances in floating point values. So instead of coming up with a bunch of different EPSILONS for each type of floating point number (vertex, normal, etc), I wanted to use something such as this twos compliment compare. Thus avoiding the whole multiple EPSILON issue.
The same type of issue can apply to any type of software that processes 3rd party data and then needs to validate their results with the original. In these cases you might not even know what the floating point values represent, you just have to compare them. We ran into this issue with our industrial automation software.
EDIT:
LOL, this has been voted up and down by different people.
I'll boil the problem down to this, given two arbitrary floating point numbers, how do you decide what epsilon to use? You can't.
How can you compare 1e23 and 1.0001e23 with an epsilon and still compare 1e-23 and 5.2e-23 using the same epsilon? Sure, you can do some dynamic epsilon tricks, but that is the whole point to the integer compare (which does NOT require the integers be exact).
The integer compare is able to compare two floats using an epsilon relative to the magnitude of the numbers.
EDIT
Steve, lets look at what you said in the comments:
"But you know what equality means to you... Hence, you should be able to find an appropriate epsilon".
Turn this statement around to say:
"If you know what equality means to you, then you should be able to find an appropriate epsilon."
The whole point to what I am trying to say is that there are applications where we don't know what equality means in the absolute sense, thus we have to resort to a relative compare which is what the integer version is trying to do.

When it comes to speed, follow these rules:
If you're not a very experienced developer, don't optimize.
If you are an experienced developer, don't optimize yet.
Do the easiest method.
Alex

Oskar's right. Don't screw with this unless you really, really need that performance.
And you don't. If you were in the situation that did, you wouldn't have needed to ask the question -- you'd already know. If you think you do, then you don't. Your performance problems lie elsewhere. Just use the readable version.

Using any method that compares bitwise will result in trouble when fractions are represented by approximations. All floating point numbers with fractions that are not denominated in powers of two (1/2, 1/4, 1/8, 1/65536, &c) are approximated. So, of course, are all irrational numbers.
float third = 1/3;
float two=2.0;
float another_two=third*6.0;
if(two != another_two)
print ("Approximation!\n");
The only time comparing bitwise would work is when you derive the floating point numbers exactly the same way or they are exact representations (whole numbers, fraction powers of two). Even then, there can be multiple representations of some numbers, though I have never seen this in a working system.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js