Physics engine: use double or single precision? - c++

I am making a rigid body physics engine from scratch (for educational purposes), and I'm wondering if I should choose single or double precision floats for it.
I will be using OpenGL to visualize it and the glm library to calculate stuff internally in the engine as well as for the visualization. The convention seems to be to use floats for OpenGL pretty much everywhere and glm::vec3 and glm::vec4 seem to be using float internally. I also noticed that there is glm::dvec3 and glm::dvec4 though but nobody seems to be using it. How do I decide which on to use? double seems to make sense as it has more precision and pretty much the same performance on today's hardware (as far as I know), but everything else seems to use float except for some of GLu's functions and some of GLFW's.

This is all going to depend on your application. You pretty much already understand the tradeoffs between the two:
Single-precision
Less accurate
Faster computations even on todays hardware. Take up less memory and operations are faster. Get more out of cache optimizations, etc.
Double-precision
More accurate
Slower computations.
Typically in graphics applications the precision for floats is plenty given the number of pixels on the screen and scaling of the scene. In scientific settings or smaller scale simulation you may need the extra precision. It also may depend on your hardware. For instance, I coded a physically based simulation for rigid bodies on a netbook and switching to float gained on average 10-15 FPS which almost doubled the FPS at that point in my implementation.
My recommendation is that if this is an educational activity use floats and target the graphics application. If you find in your studies and timing and personal experience you need double-precision then head in that direction.

Surely the general rule is correctness first and performance second? That means using doubles unless you can convince yourself that you'll get fidelity required using floats.
The thing to look at is the effective size of one bit the coordinate system relative to the smallest size you intend to model.
For example, if you use earth coordinates, 100 degrees works around to around 1E7 metres.
An IEEE 754 float has only 23 bits of precision, so that gives a relative precision of only about 1E-7.
Hence the coordinate is only accurate to around 1 meter. This may or may not be sufficient for the problem.
I have learnt from experience to always use doubles for the physics and physical modelling calculations, but concede that cannot be a universal requirement.
It does not of course follow that the rendering should be using double; you may well want that as a float.

I used typedef in a common header and went with float as my default.
typedef real_t float;
I would not recommend using templates for this because it causes huge design problem as you try to use polymorphic/virtual function.
Why floats does work
The floats worked pretty fine for me for 3 reasons:
First, almost every physical simulation would involve adding some noise to the forces and torques to be realistic. This random noises are usually far larger in magnitude than precision of floats.
Second, having limited precision is actually beneficial on many instances. Consider that almost all of the classical mechanics for rigid body doesn't apply in real world because there is no such thing as perfect rigid body. So when you apply force to less than perfect rigid body you don't gets perfect acceleration to the 7th digit.
Third, many simulations are for short duration so the accumulated errors remain small enough. Using double precision doesn't change this automatically. Creating long running simulations that matches the real world is extremely difficult and would be very specialized project.
When floats don't work
Here are the situation where I had to consider using double.
Latitude and longitudes should be double. Floats simply doesn't have good enough resolution for most purposes for these quantities.
Computing integral of very small quantities over time. For example, Gaussian Markov process is good way to represent random walks in sensor bias. However the values will typically be very small and accumulates. Errors in calculation could be much bigger in floats than doubles.
Specialized simulations that goes beyond usual classical mechanics of linear and rotational motions of rigid body. For example, if you do things with protein molecules, crystal growth, micro-gravity physics etc then you probably want to use double.
When doubles don't work
There are actually times when higher precision in double hurts, although its rare. An example from What every computer scientists should know...: if you have some quantity that is converging to 1 over time. You take its log and do something if result is 0. When using double, you might never get to 1 because rounding might not happen but with floats it might.
Another example: You need to use special code to compare real values. These code often has default rounding to epsilon which for float is fairly reasonable 1E-6 but for double its 1E-15. If you are not careful, this can give lot of surprises.
Performance
Here's another surprise: On modern x86 hardware there is little difference between raw performance of float vs double. The memory alignment, caching etc almost overwhelmingly dominates more than floating point types. On my machine a simple summation test of 100M random numbers with floats took 22 sec and with double it takes 25 sec. So floats are 12% faster indeed but I still think its too low to abandon double just for performance. However if you use SSE instructions or GPUs or embedded/mobile hardware like Arduino then floats would be much more faster and that can most certainly be driving factor.
A Physics engine that does nothing but linear and rotational motions of rigid body can run at 2000Hz on today's desktop-grade hardware on single thread. You can trivially parallelize this to many cores. Lot of simple low end simulations require just 50Hz. At 100Hz things starts to get pretty smooth. If you have things like PID controllers, you might have to go up to 500Hz. But even at that worse-case rate, you can still simulate plenty of objects with good enough desktop.
In summary, don't let performance be your driving factor unless you actually measure it.
What to do
A rule of thumb is to use as much precision as you need to get your code work. For simple physics engine for rigid body, float are often good enough. However you want to be able to change your mind without revamping your code. So the best approach is to use typedef as mentioned at the start and make sure you have your code working for float as well as double. Then measure often and chose the type as your project evolves.
Another vital thing in your case: Keep physics engine religiously separated from rendering system. Output from physics engine could be either double or float and should be typecasted to whatever rendering system needs.

Here's the short answer.
Q. Why does OpenGL use float rather than double?
A. Because most of the time you don't need the precision and doubles
are twice the size.
Another thing to consider is that you shouldn't use doubles everywhere, just as some things may take require using a double as opposed to a float. For example, if you are drawing a circle by drawing squares by looping through the angles, there can only be so many squares shown on the screen. They will overlap, and in this case, doubles would be pointless. However if you're doing arbitrary floating point arithmetic, you may need the extra precision if you're trying to accurately represent the Mandelbrot series (although that totally depends on your algorithm.)
Either way, in the end, you will need to usually cast back to float if you intend to use those values in drawing.

Single prec operations are faster and the data uses less memory less network bandwidth. So you only use double if you gain something in exchange for slower ops and more mem and bandwidth required. There are certainly applications of rigid body physics where the extra precision would be worth it, such as in manipulating lat\lon where single precision only gives you meter accuracy but is this your case?
Since it's educational purpose, maybe you want to educate yourself in the use of high precision physics algorithms where the extra accuracy would matter but lots of rigid body phys involves processes that can only be approximately quantified such as friction between 2 solids, collision reaction after detection etc, that extra precision wont matter you just get more precise approximate behavior :)

Related

Built-in type efficiency

Under The most efficient types second here
...and when defining an object to store a floating point number, use the double type, ... The double type is two to three times less efficient than the float type...
Seems like it's contradicting itself?
And I read elsewhere (can't remember where) that computations involving ints are faster than shorts on many machines because they are converted to ints to perform the operations? Is this true? Any links on this?
One can always argue about the quality of the contents on the site you link to. But the two quotes you refer to:
...and when defining an object to store a floating point number, use the double type, ...
and
... The double type is two to three times less efficient than the float type...
Refer to two different things, the first hints that using doubles will give much less problems due to the increased precision, while the other talks about performance. But honestly I wouldn't pay too much attention to that, chance is that if your code performs suboptimal it is due to incorrect choice of algorithm rather than wrong choice of primitive data type.
Here is a quote about performance comparison of single and double precision floats from one of my old teachers: Agner Fog, who has a lot of interesting reads over at his website: http://www.agner.org about software optimizations, if you are really interested in micro optimizations go take a look at it:
In most cases, double precision calculations take no more time than single precision. When the floating point registers are used, there is simply no difference in speed between single and double precision. Long double precision takes only slightly more time. Single precision division, square root and mathematical functions are calculated faster than double precision when the XMM registers are used, while the speed of addition, subtraction, multiplication, etc. is still the same regardless of precision on most processors (when vector operations are not used).
source: http://agner.org/optimize/optimizing_cpp.pdf
While there might be different variations for different compilers, and different processors, the lesson one should learn from it, is that most likely you do not need to worry about optimizations at this level, look at choice of algorithm, even data container, not the primitive data type.
These optimizations are negligible unless you are writing software for space shuttle launches (which recently have not been doing too well). Correct code is far more important than fast code. If you require the precision, using doubles will barely affect the run time.
Things that affect execution time way more than type definitions:
Complexity - The more work there is to do, the more slowly the code will run. Reduce the amount of work needed, or break it up into smaller, faster tasks.
Repetition - Repetition can often be avoided and will inevitably ruin code performance. It comes in many guises-- for example, failing to cache the results of expensive calculations or of remote procedure calls. Every time you recompute, you waste efficiency. They also extend the executable size.
Bad Design - Self explanatory. Think before you code!
I/O - A program whose execution is blocked waiting for input or output (to and from the user, the disk, or a network connection) is bound to perform badly.
There are many more reasons, but these are the biggest. Personally, bad design is where I've seen most of it happen. State machines that could have been stateless, dynamic allocation where static would have been fine, etc. are the real problems.
Depending on the hardware, the actual CPU (or FPU if you like) performance of double is somewhere between half the speed and same speed on modern CPU's [for example add or subtract is probably same speed, multiply or divide may be different for larger type], when compared to float.
On top of that, there are "fewer per cache-line", so if when there is a large number of them, it gets slower still because memory speed is slower. Per cache-line, there are half as many double values -> about half the performance if the application is fully memory bound. It will be much less of a factor in a CPU-bound application.
Similarly, if you use SSE or similar SIMD technologies, the double will take up twice as much space, so the number of actual calculation with be half as many "per instruction", and typically, the CPU will allow the same number of instructions per cycle for both float and double - except for some operations that take longer for double. Again, leading to about half the performance.
So, yes, I think the page in the link is confusing and mixing up the ideal performance setup between double and float. That is, from a pure performance perspective. It is often much easier to get noticeable calculation errors when using float - which can be a pain to track down - so starting with double and switching to float if it's deemed necessary because you have identified it as a performance issue (either from experience or measurements).
And yes, there are several architectures where only one size integer exists - or only two sizes, such as 8-bit char and 32-bit int, and 16-bit short would be simulated by performing the 32-bit math, and then dropping the top part of the value. For example MIPS has only got 32-bit operations, but can store and load 16-bit values to memory. It doesn't necessarily make it slower, but it certainly means that it's "not faster".

Convert all doubles to integers for better performance, is it just a rumor?

I have a very complicated and sophisticated data fitting program which uses the Levenverg-Marquardt algorithm to do fitting in double precision (basically the fitting class is templatized, but I use instantiate it to doubles). The fitting process involves:
Calculating an error function (chi-square)
Solving a system of linear equations (I use lapack for that)
calculating the derivatives of a function with respect to the parameters, which I want to fit to the data (usually 20+ parameters)
calculating the function value continuously: the function is a complicated combination of a sinusoidal and exponential functions with a few harmonics.
A colleague of mine has suggested that I use integers for at least 10 times faster at least. My questions are:
Is that true that I will get that kind of improvement?
Is it safe to convert everything to integers? And what are the drawbacks to this?
What advice would you have for this whole issue? What would you do?
The program is developed to calculate some parameters from the signal online, which means that the program must be as fast as possible, but I'm wondering whether it's worth it to start the project of converting everything to integers.
The amount of improvement depends on your platform. For example, if your platform has a fast floating point coprocessor, performing arithmetic in floating point may be faster than integral arithmetic.
You may be able to get more performance gain by optimizing your algorithms rather than switching to integer arithmetic.
Another method for boosting performance is to reduce data cache hits and also reducing branches and loops.
I would measure performance of the program to find out where the bottlenecks are and then review the sections that where most of the performance takes place. For example, in my embedded system, micro-optimizations like what you are suggesting, saved 3 microseconds. This gain is not worth the effort to retest the entire system. If it works, don't fix it. Concentrate on correctness and robustness first.
The bottom line here is that you have to test it and decide for yourself. Profile a release build using real data.
1- Is that true that I will get that kind of improvement?
Maybe yes, maybe no. It depends on a number of factors, such as
How long it takes to convert from double to int
How big a word is on your machine
What platform/toolset you're using and what optimizations you have enabled
(Maybe) how big a cache line is on your platform
How fast your memory is
How fast your platform computes floating-point versus integer.
And who knows what else. In short, too many complex variables for anyone to be able to say for sure if you will or will not improve performance.
But I would be highly skeptical about your friend's claim, "at least 10 times faster at least."
2- Is it safe to convert everything to integers? And what are the
drawbacks to this?
It depends on what you're converting and how. Obviously converting a value like 123.456 to an integer is decidedly unsafe.
Drawbacks include loss of precision, loss of accuracy, and the expense in terms of space and time to actually do the conversions. Another significant drawback is the fact that you have to write a substantial amount of code, and every line of code you write is a probable source of new bugs.
3- What advice would you have for this whole issue? What would you do?
I would step back & take a deep breath. Profile your code under real-world conditions. Identify the sources of the bottlenecks. Find out what the real problems are, and if there even are any.
Identify inefficiencies in your algorithms, and fix them.
Throw hardware at the problem.
Then you can endeavor to start micro-optimizing. This would be my last resort, especially if the optimization technique you are considering would require writing a lot of code.
First, this reeks of attempting to optimize unnecessarily.
Second, doubles are a minimum of 64-bits. ints on most systems are 32-bits. So you have a couple of choices: truncate the double (which reduces your precision to a single), or store it in the space of 2 integers, or store it as an unsigned long long (which is at least 64-bits as well). For the first 2 options, you are facing a performance hit as you must convert the numbers back and forth between the doubles you are operating on and the integers you are storing it as. For the third option, you are not gaining any performance increase (in terms of memory usage) as they are basically the same size - so you'd just be converting them to integers for no reason.
So, to get to your questions:
1) Doubtful, but you can try it to see for yourself.
2) The problem isn't storage as the bits are just bits when they get into memory. The problem is the arithmetic. Since you stated you need double precision, attempting to do those operations on an integer type will not give you the results you are looking for.
3) Don't optimize until it has been proven something needs to have a performance improvement. And always remember Amdahl's Law: Make the common case fast and the rare case correct.
What I would do is:
First tune it in single-thread mode (by the random-pausing method) until you can't find any way to reduce cycles. The kinds of things I've found are:
a large fraction of time spent in library functions like sin, cos, exp, and log where the arguments were often unchanged, so the answers would be the same. The solution for that is called "memoizing", where you figure out a place to store old values of arguments and results, and check there first before calling the function.
In calling library functions like DGEMM (lapack matrix-multiply) that one would assume are optimized to the teeth, they are actually spending a large fraction of time calling a function to determine if the matrices are upper or lower triangle, square, symmetric, or whatever, rather than actually doing the multiplication. If so, the answer is obvious - write a special routine just for your situation.
Don't say "but I don't have those problems". Of course - you probably have different problems - but the process of finding them is the same.
Once you've made it as fast as possible in single-thread, then figure out how to parallelize it. Multi-threading can have high overhead, so it's best not to tightly-couple the threads.
Regarding your question about converting from doubles to integers, the other answers are right on the money. It only makes sense in very particular situations.

Why would you use float over double, or double over long double?

I'm still a beginner at programming and I always have more questions than our book or internet searches can answer (unless I missed something). So I apologize in advance if this was answered but I couldn't find it.
I understand that float has a smaller range than double making it less precise, and from what I understand, long double is even more precise(?). So my question is why would you want to use a variable that is less precise in the first place? Does it have something to do with different platforms, different OS versions, different compilers? Or are there specific moments in programming where its strategically more advantageous to use a float over a double/long double?
Thanks everyone!
In nearly all processors, "smaller" floating point numbers take the same or less clock-cycles in execution. Sometimes the difference isn't very big (or nothing), other times it can be literally twice the number of cycles for double vs. float.
Of course, memory foot-print, which is affecting cache-usage, will also be a factor. float takes half the size of double, and long double is bigger yet.
Edit: Another side-effect of smaller size is that the processor's SIMD extensions (3DNow!, SSE, AVX in x86, and similar extensions are available in several other architectures) may either only work with float, or can take twice as many float vs. double (and as far as I know, no SIMD instructions are available for long double in any processor). So this may improve performance if float is used vs. double, by processing twice as much data in one go. End edit.
So, assuming 6-7 digits of precision is good enough for what you need, and the range of +/-10+/-38 is sufficient, then float should be used. If you need either more digits in the number, or a bigger range, move to double, and if that's not good enough, use long double. But for most things, double should be perfectly adequate.
Obviously, the importance of using "the right size" becomes more important when you have either lots of calculations, or lots of data to work with - if there are 5 variables, and you just use each a couple of times in a program that does a million other things, who cares? If you are doing fluid dynamics calculations for how well a Formula 1 car is doing at 200 mph, then you probably have several tens of million datapoints to calculate, and every data point needs to be calculated dozens of times per second of the cars travel, then using up just a few clockcycles extra in each calculation will make the whole simulation take noticeably longer.
There are two costs to using float, the obvious one of its limited range and precision, and, less obviously, the more difficult analysis those limitations impose.
It is often relatively easy to determine that double is sufficient, even in cases where it would take significant numerical analysis effort to show that float is sufficient. That saves development cost, and risk of incorrect results if the more difficult analysis is not done correctly.
Float's biggest advantage on many processors is its reduced memory footprint. That translates into more numbers per cache line, and more memory bandwidth in terms of numbers transferred per second. Any gain in compute performance is usually relatively slight - indeed, popular processors do all floating point arithmetic in one format that is wider than double.
It seems best to use double unless two conditions are met - there are enough numbers for their memory footprint to be a significant performance issue, and the developers can show that float is precise enough.
You might be interested in seeing the answer posted here Should I use double or float?
But it boils down to memory footprint vs the amount of precision you need for a given situation. In a physics engine, you might care more about precision, so it would make more sense to use a double or long double.
Bottom line:
You should only use as much precision as you need for a given algorithm
The basic principle here would be don't use more than you need.
The first consideration is memory use, you probably realized that already, if you are making only one double no big deal, but what if you create a billion than you just used twice as much memory space as you had too.
Next is processor utilization, I believe on many processors if you use smaller data types it can do a form of threading where it does multiple operations at once.
So an extension to this part of the answer is SSE instructions basically this allows you to used packed data to do multiple floating point operations at once, which in an idealized case can double the speed of your program.
Lastly is readability, when someone is reading your code if you use a float they will immediately realize that you are not going over a certain number. IMO sometimes the right precision number will just flow better in the code.
A float uses less memory than a double, so if you don't need your number to be the size of a double, you might as well use a float since it will take up less memory.
Just like you wouldn't use a bus to drive yourself and a friend to the beach... you would be far better off going in a 2 seater car.
The same applies for a double over a long double... only reserve as much memory as you are going to need. Otherwise with more complex code you run the risk of using too much memory and having processes slow down or crash.

Double or float - optimization routines

I am reading through code for optimization routines (Nelder Mead, SQP...). Languages are C++, Python. I observe that often conversion from double to float is performed, or methods are duplicated with double resp. float arguments. Why is it profitable in optimization routines code, and is it significant? In my own code in C++, should I be careful for types double and float and why?
Kind regards.
Often the choice between double and float is made more on space demands than speed. Modern processors are capable of operating on double quite fast.
Floats may be faster than doubles when using SIMD instructions (such as SSE) which can operate on multiple values at a time. Also if the operations are faster than the memory pipeline, the smaller memory requirements of float will speed things overall.
Other times that I've come across the need to consider the choice between double and float types in terms of optimisation include:
Networking. Sending double precision data across a socket connection
will obviously require more time than sending half that amount of
data.
Mobile and embedded processors may only be able to handle high
speed single precision calculations efficiently on a coprocessor.
As mentioned in another answer, modern desktop processors can handle double precision Processing quite fast. However, you have to ask yourself if the
double precision processing is really required. I work with audio,
and the only time that I can think of where I would need to process
double precision data would be when using high order filters where
numerical errors can accumulate. Most of the time this can be avoided
by paying more careful attention to the algorithm design. There are,
of course, other scientific or engineering applications where double
precision data is required in order to correctly represent a huge
dynamic range.
Even so, the question of how much effort to spend on considering the data type to use really depends on your target platform. If the platform can crunch through doubles with negligible overhead and you have memory to spare then there is no need to concern yourself. Profile small sections of test code to find out.
In certain optimization algorithms, the choice between double and float is not made more on space demands than speed. For example, with penalty or barrier methods that are used for interior point methods in nonlinear optimization, a float has insufficient precision compared to a double, and using floats in the algorithm will yield garbage. For this reason, penalty and barrier methods were not used in the 1960s, but were rediscovered later on with the advent of the double precision data type. (For more on these methods, consult Nonlinear Programming: Sequential Unconstrained Minimization Techniques (Classics in Applied Mathematics) by Fiacco and McCormick.)
Another consideration is the conditioning of the underlying linear systems solved in many optimization algorithms. If the linear systems you're solving in something like a Newton iteration are sufficiently ill-conditioned, you will not be able to obtain an accurate solution to those systems.
Only if the loss in precision will not jeopardize your numerics should you consider replacing doubles with floats; even if space constraints force you to do so, you should make sure that the accuracy of your numerical results is not compromised. Once sufficient accuracy is assured for the problems you're working on, you can then worry about space and performance optimizations. You can use the CUTEr test set to validate your optimization routines.

benchmarking trig lookup tables performance gains vs cpp implementation

We are developing a real-time system that will be performing sin/cos calculations during a time critical period of operation. We're considering using a lookup table to help with performance, and I'm trying to benchmark the benefit/cost of implementing a table. Unfortunately we don't yet know what degree of accuracy we will need, but probably around 5-6 decimal points.
I figure that a through comparison of C++ trig functions to lookup approaches has already been done previously. I was hoping that someone could provide me with a link to a site documenting any such benchmarking. If such results don't exist I would appreciate any suggestions for how I can determine how much memory is required for a lookup table assuming a given minimum accuracy, and how I can determine the potential speed benefits.
Thanks!
I can't answer all your questions, but instead of trying to determine theoretical speed benefits you would almost certainly be better off profiling it in your actual application. Then you get an accurate picture of what sort of improvement you stand to gain in your specific problem domain, which is the most useful information for your needs.
What accuracy is your degree input (let's use degrees over radians to keep the discussion "simpler"). Tenths of a degree? Hundredths of a degree? If your angle precision is not great, then your trig result cannot be any better.
I've seen this implemented as an array indexed by hundredths of a degree (keeping the angle as an integer w/two implied decimal point also helps with the calculation - no need to use high precision float/double radian angles).
Store SIN values of 0.00 to to 90.00 degrees would be 9001 32 bit float result values.
SIN[0] = 0.0
...
SIN[4500] = 0.7071068
...
SIN[9000] = 1.0
If you have SIN, the trig property of COS(a) = SIN(90-a)
just means you do
SIN[9000-a]
to get COS(a)
If you need more precision but don't have the memory for more table space, you could do linear interpolation between the two entries in the array, e.g. SIN of 45.00123 would be
SIN[4500] + 0.123 * (SIN[4501] - SIN[4500])
The only way to know the performance characteristics of the two approaches is to try them.
Yes, there are probably benchmarks of this made by others, but they didn't run in the context of your code, and they weren't running on your hardware, so they're not very applicable to your situation.
One thing you can do, however, is to look up the instruction latencies in the manuals for your CPU. (Intel and AMD have this information available in PDF form on their websites, and most other CPU manufacturers have similar documents)
Then you can at least find out how fast the actual trig instructions are, giving you a baseline that the lookup table will have to beat to be worthwhile.
But that only gives you a rough estimate of one side of the equation. You might be able to make a similar rough estimate of the cost of a lookup table as well, if you know the latencies of the CPU's caches, and you have a rough idea of the latency of memory accesses.
But the only way to get accurate information is to try it. Implement both, and see what happens in your application. Only then will you know which is better in your case.