I'm doing CPU profiling on my Mandelbrot Set explorer. For some reason, java.lang.PersistentHashMap$BitmapIndexedNode.find is using a fairly large percentage of the total CPU time. When I take a snapshot of the profiling results, and get a backtrace of the method, I get this:
I see lots of references to BigDecimal operations. It seems as though BigDecimal operators at some point require calling find on a PersistentHashMap.
Is my interpretation of the backtrace correct? Are calls to find a result of BigDecimal operations, meaning there's nothing I can do about it? That seems like an odd thing for them to require. I'm having a hard time digging deeper than clojure.lang.Numbers$BigDecimalOps though to verify this though.
Your interpretation is correct. Addition, multiplication, negation, division and other BigDecimal operations end up doing hash maps look ups. Those are a part of de-referencing the *math-context* Var. It happens every time an arithmetic operation on two BigDecimal objects is performed in Clojure. There is nothing that can be done about it short of switching to other numerical types, like double.
The clojure.core/*math-context* dynamic Var does not have a docstring. As far as I can tell, it is intended to hold a java.math.MathContext object. MathContext objects can be used to specify precision and rounding mode for BigDecimal operations. If *math-context* is bound, its value is passed to BigDecimal methods in the Java runtime as in BigDecimal.add(BigDecimal augend, MathContext mc). When *math-context* is not bound, BigDecimal methods are called without passing a context.
The relevant part of the stack trace from the question is:
...
clojure.lang.PersistentHashMap.entryAt(Object)
clojure.lang.Var.getThreadBinding()
clojure.lang.Var.deref()
clojure.lang.Numbers$BigDecimalOps.add/multiply/..
...
Some pointers to Clojure source code:
*math-context* Var defined
*math-context* de-referenced when BigDecimal addition is performed. Same thing happens for other operations too
Dereferencing a Var requires a call to getThreadBinding
A hash map is looked up inside getThreadBinding
Related
In C and C++ floating point computations are non deterministic by default as not even the true datatype is chosen by user, as for any intermediate computation of a FP subexpression, the compiler can choose to represent a value with higher precision (that is as another real datatype).
[Some compilers (GCC) were known to do that for any automatic variable, not just the (anonymous) intermediate result of a subexpression.]
The compiler can do that for a few computations, in some functions; it can do it in some cases and not other for exactly the same subexpression.
It can even inline a function and use a different precision each time a function is called in the source code. It means that any inlinable function can have its semantics call dependent; only separately compiled, ABI-called functions (those functions that are called according to conventions described by the ABI and which act essentially as a black box) have the absolute guarantee of having only one floating point behavior, fixed during separate compilation (that means that no global optimization occurs).
[Note that this is similar to how string literals are defined: any two computation of the same string literal in the source code can refer to the same or different character arrays.]
That means that even for a purely applicative functions the fundamental equality f(x) == f(x) is only guaranteed if floating point operations (and string literals) are not used (or that the address of a string literal is only used to access its elements).
So floating point operations have non deterministic semantics with an arbitrary choice made by the compiler for each and every FP operations (which seems a lot more perverse that the very small issue of letting the compiler choose which subexpression A or B to compute first in A+B).
It seems that function that does any computation with intermediate floating point values cannot be used in any STL container or algorithm that expects a functor satisfying axioms, such that
sorted containers: set, map, multiset, multimap
hashed containers
sorting algorithms: sort, stable_sort
algorithm operating on sorted ranges: lower_bound, set_union, set_intersection...
as all binary predicates and hash functions must be deterministic before axioms can even be conceived, that they must be is purely applicative, mathematical function with a defined value of all possible inputs, which is never the case with C++ non-deterministic floating point intermediate values?
In other words, are floating point operations by default almost unusable based on the standard alone, and only usable using in real world implementations that have some (vague) implicit guarantees of determinism?
A compiler is allowed to use higher precision operations and temporary values when evaluating floating-point arithmetic expressions. So if you do a1 = b+c; a2 = b+c;, then yes, it's possible for a1 and a2 to not be equal to each other (due to double rounding).
That really doesn't matter to the algorithms and containers you mentioned, though; those don't do any arithmetic on their values. The "axioms" that those rely on are those of ordering relationships at most.
The worst you could say is that if you'd previously stored b+c in, say, a set, doing mySet.find(b+c) might fail. So yeah, don't do that. But that's something you don't generally do anyway, the rounding error produced by floating-point arithmetic already making it rare to expect exact equality from derived quantities.
About the only time that you do start having to worry about extended precision is when you're calculating theoretical FP error bounds on a sequence of operations. When you're doing that, you'll know what to look for. Until then, this problem is not a problem.
I'm writing a Mandelbrot Set explorer. I need as much precision as possible so I can zoom in as far as possible.
I noticed an unfortunate side-effect of mixing doubles and BigDecimals: they "contaminate" the type returned:
(type (* 1M 2))
=> java.math.BigDecimal
(type (* 1M 2.0))
=> java.lang.Double
I expected the opposite. BigDecimals, being potentially more precise, should contaminate the doubles.
Besides manually calling bigdec on every number that may come in contact with a BigDecimal, is there a way of preventing the auto-downgrade to double when doing math on doubles and BigDecimals?
Once you introduce a double into the equation, you limit the amount of precision you can possibly have. A BigDecimal accurate to within a million decimal places is no use to you, if the way you got it involved multiplying by something with just 15 or so significant digits. You could promote the result to a BigDecimal, but you've lost a ton of precision whether you like it or not. Therefore, Clojure's promotion rules make that obvious for you, by giving back a double instead of a high-precision BigDecimal.
See, for example, BigDecimal's JavaDoc for an explanation of why it is a bad idea to convert doubles to BigDecimals, implicitly or explicitly.
This isn't actually a bug, even though it at least looks wrong. To more clearly show how this leads to wrong looking answers compare these expressions:
user> (* 2.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001M 1.0)
2.0
user> (* 2.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001M 1.0M)
2.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010M
For the time being you will likely need to, as you suggest, make sure you only use big decimals in your program. It will likely be limited to the IO functions and any constants you introduce need the M on the end. Adding preconditions to functions will likely help catch some cases as well.
I implemented my mathematical model using Ilog Cplex with c++. Most of my decision variables have fractional values in the optimal solutions. Some of them are very small that cplex outputs them as 0. Is there a way to increase the precision so that I can still see the values of such variables?
Also, when I use cplex.getBestObjValue(), it gives me "-Inf". (This is a maximization problem.)
Having values for integer variables that are close to (but not exactly) integer values is quite normal. CPLEX has an integrality tolerance so that these values are accepted as close enough to the correct integer values. Just use standard C++ output functions to output these values to whatever precision you want.
Mostly this is not a problem, but you can set the integrality tolerance to a smaller value if necessary. I normally round these values to the nearest integer value and use that as my solution. You can also try re-solving your model with those decision variables fixed to their rounded integer values to be sure the solution really is valid. If you are not sure that is sufficient, try Alex's suggestion for numerical precision emphasis too.
you could try to use the setting IloCplex::Param::Emphasis::Numerical
Emphasizes precision in numerically unstable or difficult problems. This parameter lets you specify to CPLEX that it should emphasize precision in numerically difficult or unstable problems, with consequent performance trade-offs in time and memory.
About your second question, is your model a LP ?
regards
Without getting into unnecessary details, is it possible for operations on floating-point numbers (x86_64) to return -however small- variations on their results, based on identical inputs? Even a single bit different?
I am simulating a basically chaotic system, and I expect small variations on the data to have visible effects. However I expected that, with the same data, the behavior of the program would be fixed. This is not the case. I get visible, but acceptable, differences with each run of the program.
I am thinking I have left some variable uninitialized somewhere...
The languages I am using are C++ and Python.
ANSWER
Russell's answer is correct. Floating point ops are deterministic. The non-determinism was caused by a dangling pointer.
Yes, this is possible. Quoting from the C++ FAQ:
Turns out that on some installations, cos(x) != cos(y) even though x == y. That's not a typo; read it again if you're not shocked: the cosine of something can be unequal to the cosine of the same thing. (Or the sine, or the tangent, or the log, or just about any other floating point computation.)
Why?
[F]loating point calculations and comparisons are often performed by special hardware that often contain special registers, and those registers often have more bits than a double. That means that intermediate floating point computations often have more bits than sizeof(double), and when a floating point value is written to RAM, it often gets truncated, often losing some bits of precision.
Contra Thomas's answer, floating point operations are not non-deterministic. They are fiendishly subtle, but a given program should give the same outputs for the same inputs, if it is not using uninitialized memory or deliberately randomized data.
My first question is, what do you mean by "the same data"? How is that data getting into your program?
Is there a general best practice strategy for dealing with floating point inaccuracy?
The project that I'm working on tried to solve them by wrapping everything in a Unit class which holds the floating point value and overloads the operators. Numbers are considered equal if they "close enough," comparisons like > or < are done by comparing with a slightly lower or higher value.
I understand the desire to encapsulate the logic of handling such floating point errors. But given that this project has had two different implementations (one based on the ratio of the numbers being compared and one based on the absolute difference) and I've been asked to look at the code because its not doing the right, the strategy seems to be a bad one.
So what is best the strategy for try to make sure you handle all of the floating point inaccuracy in a program?
You want to keep data as dumb as possible, generally. Behavior and the data are two concerns that should be kept separate.
The best way is to not have unit classes at all, in my opinion. If you have to have them, then avoid overloading operators unless it has to work one way all the time. Usually it doesn't, even if you think it does. As mentioned in the comments, it breaks strict weak ordering for instance.
I believe the sane way to handle it is to create some concrete comparators that aren't tied to anything else.
struct RatioCompare {
bool operator()(float lhs, float rhs) const;
};
struct EpsilonCompare {
bool operator()(float lhs, float rhs) const;
};
People writing algorithms can then use these in their containers or algorithms. This allows code reuse without demanding that anyone uses a specific strategy.
std::sort(prices.begin(), prices.end(), EpsilonCompare());
std::sort(prices.begin(), prices.end(), RatioCompare());
Usually people trying to overload operators to avoid these things will offer complaints about "good defaults", etc. If the compiler tells you immediately that there isn't a default, it's easy to fix. If a customer tells you that something isn't right somewhere in your million lines of price calculations, that is a little harder to track down. This can be especially dangerous if someone changed the default behavior at some point.
Check comparing floating point numbers and this post on deniweb and this on SO.
Both techniques are not good. See this article.
Google Test is a framework for writing C++ tests on a variety of platforms.
gtest.h contains the AlmostEquals function.
// Returns true iff this number is at most kMaxUlps ULP's away from
// rhs. In particular, this function:
//
// - returns false if either number is (or both are) NAN.
// - treats really large numbers as almost equal to infinity.
// - thinks +0.0 and -0.0 are 0 DLP's apart.
bool AlmostEquals(const FloatingPoint& rhs) const {
// The IEEE standard says that any comparison operation involving
// a NAN must return false.
if (is_nan() || rhs.is_nan()) return false;
return DistanceBetweenSignAndMagnitudeNumbers(u_.bits_, rhs.u_.bits_)
<= kMaxUlps;
}
Google implementation is good, fast and platform-independent.
A small documentation is here.
To me floating point errors are essentially those which on an x86 would lead to a floating point exception (assuming the coprocessor has that interrupt enabled). A special case is the "inexact" exception i e when the result was not exactly representable in the floating point format (such as when dividing 1 by 3). Newbies not yet at home in the floating-point world will expect exact results and will consider this case an error.
As I see it there are several strategies available.
Early data checking such that bad values are identified and handled
when they enter the software. This lessens the need for testing
during the floating operations themselves which should improve
performance.
Late data checking such that bad values are identified
immediately before they are used in actual floating point operations.
Should lead to lower performance.
Debugging with floating point
exception interrupts enabled. This is probably the fastest way to
gain a deeper understanding of floating point issues during the
development process.
to name just a few.
When I wrote a proprietary database engine over twenty years ago using an 80286 with an 80287 coprocessor I chose a form of late data checking and using x87 primitive operations. Since floating point operations were relatively slow I wanted to avoid doing floating point comparisons every time I loaded a value (some of which would cause exceptions). To achieve this my floating point (double precision) values were unions with unsigned integers such that I would test the floating point values using x86 operations before the x87 operations would be called upon. This was cumbersome but the integer operations were fast and when the floating point operations came into action the floating point value in question would be ready in the cache.
A typical C sequence (floating point division of two matrices) looked something like this:
// calculate source and destination pointers
type1=npx_load(src1pointer);
if (type1!=UNKNOWN) /* x87 stack contains negative, zero or positive value */
{
type2=npx_load(src2pointer);
if (!(type2==POSITIVE_NOT_0 || type2==NEGATIVE))
{
if (type2==ZERO) npx_pop();
npx_pop(); /* remove src1 value from stack since there won't be a division */
type1=UNKNOWN;
}
else npx_divide();
}
if (type1==UNKNOWN) npx_load_0(); /* x86 stack is empty so load zero */
npx_store(dstpointer); /* store either zero (from prev statement) or quotient as result */
npx_load would load value onto the top of the x87 stack providing it was valid. Otherwise the top of the stack would be empty. npx_pop simply removes the value currently at the top of the x87. BTW "npx" is an abbreviation for "Numeric Processor eXtenstion" as it was sometimes called.
The method chosen was my way of handling floating-point issues stemming from my own frustrating experiences at trying to get the coprocessor solution to behave in a predictable manner in an application.
For sure this solution led to overhead but a pure
*dstpointer = *src1pointer / *src2pointer;
was out of the question since it didn't contain any error handling. The extra cost of this error handling was more than made up for by how the pointers to the values were prepared. Also, the 99% case (both values valid) is quite fast so if the extra handling for the other cases is slower, so what?