Built-in type efficiency - c++

Under The most efficient types second here
...and when defining an object to store a floating point number, use the double type, ... The double type is two to three times less efficient than the float type...
Seems like it's contradicting itself?
And I read elsewhere (can't remember where) that computations involving ints are faster than shorts on many machines because they are converted to ints to perform the operations? Is this true? Any links on this?

One can always argue about the quality of the contents on the site you link to. But the two quotes you refer to:
...and when defining an object to store a floating point number, use the double type, ...
and
... The double type is two to three times less efficient than the float type...
Refer to two different things, the first hints that using doubles will give much less problems due to the increased precision, while the other talks about performance. But honestly I wouldn't pay too much attention to that, chance is that if your code performs suboptimal it is due to incorrect choice of algorithm rather than wrong choice of primitive data type.
Here is a quote about performance comparison of single and double precision floats from one of my old teachers: Agner Fog, who has a lot of interesting reads over at his website: http://www.agner.org about software optimizations, if you are really interested in micro optimizations go take a look at it:
In most cases, double precision calculations take no more time than single precision. When the floating point registers are used, there is simply no difference in speed between single and double precision. Long double precision takes only slightly more time. Single precision division, square root and mathematical functions are calculated faster than double precision when the XMM registers are used, while the speed of addition, subtraction, multiplication, etc. is still the same regardless of precision on most processors (when vector operations are not used).
source: http://agner.org/optimize/optimizing_cpp.pdf
While there might be different variations for different compilers, and different processors, the lesson one should learn from it, is that most likely you do not need to worry about optimizations at this level, look at choice of algorithm, even data container, not the primitive data type.

These optimizations are negligible unless you are writing software for space shuttle launches (which recently have not been doing too well). Correct code is far more important than fast code. If you require the precision, using doubles will barely affect the run time.
Things that affect execution time way more than type definitions:
Complexity - The more work there is to do, the more slowly the code will run. Reduce the amount of work needed, or break it up into smaller, faster tasks.
Repetition - Repetition can often be avoided and will inevitably ruin code performance. It comes in many guises-- for example, failing to cache the results of expensive calculations or of remote procedure calls. Every time you recompute, you waste efficiency. They also extend the executable size.
Bad Design - Self explanatory. Think before you code!
I/O - A program whose execution is blocked waiting for input or output (to and from the user, the disk, or a network connection) is bound to perform badly.
There are many more reasons, but these are the biggest. Personally, bad design is where I've seen most of it happen. State machines that could have been stateless, dynamic allocation where static would have been fine, etc. are the real problems.

Depending on the hardware, the actual CPU (or FPU if you like) performance of double is somewhere between half the speed and same speed on modern CPU's [for example add or subtract is probably same speed, multiply or divide may be different for larger type], when compared to float.
On top of that, there are "fewer per cache-line", so if when there is a large number of them, it gets slower still because memory speed is slower. Per cache-line, there are half as many double values -> about half the performance if the application is fully memory bound. It will be much less of a factor in a CPU-bound application.
Similarly, if you use SSE or similar SIMD technologies, the double will take up twice as much space, so the number of actual calculation with be half as many "per instruction", and typically, the CPU will allow the same number of instructions per cycle for both float and double - except for some operations that take longer for double. Again, leading to about half the performance.
So, yes, I think the page in the link is confusing and mixing up the ideal performance setup between double and float. That is, from a pure performance perspective. It is often much easier to get noticeable calculation errors when using float - which can be a pain to track down - so starting with double and switching to float if it's deemed necessary because you have identified it as a performance issue (either from experience or measurements).
And yes, there are several architectures where only one size integer exists - or only two sizes, such as 8-bit char and 32-bit int, and 16-bit short would be simulated by performing the 32-bit math, and then dropping the top part of the value. For example MIPS has only got 32-bit operations, but can store and load 16-bit values to memory. It doesn't necessarily make it slower, but it certainly means that it's "not faster".

Related

Convert all doubles to integers for better performance, is it just a rumor?

I have a very complicated and sophisticated data fitting program which uses the Levenverg-Marquardt algorithm to do fitting in double precision (basically the fitting class is templatized, but I use instantiate it to doubles). The fitting process involves:
Calculating an error function (chi-square)
Solving a system of linear equations (I use lapack for that)
calculating the derivatives of a function with respect to the parameters, which I want to fit to the data (usually 20+ parameters)
calculating the function value continuously: the function is a complicated combination of a sinusoidal and exponential functions with a few harmonics.
A colleague of mine has suggested that I use integers for at least 10 times faster at least. My questions are:
Is that true that I will get that kind of improvement?
Is it safe to convert everything to integers? And what are the drawbacks to this?
What advice would you have for this whole issue? What would you do?
The program is developed to calculate some parameters from the signal online, which means that the program must be as fast as possible, but I'm wondering whether it's worth it to start the project of converting everything to integers.
The amount of improvement depends on your platform. For example, if your platform has a fast floating point coprocessor, performing arithmetic in floating point may be faster than integral arithmetic.
You may be able to get more performance gain by optimizing your algorithms rather than switching to integer arithmetic.
Another method for boosting performance is to reduce data cache hits and also reducing branches and loops.
I would measure performance of the program to find out where the bottlenecks are and then review the sections that where most of the performance takes place. For example, in my embedded system, micro-optimizations like what you are suggesting, saved 3 microseconds. This gain is not worth the effort to retest the entire system. If it works, don't fix it. Concentrate on correctness and robustness first.
The bottom line here is that you have to test it and decide for yourself. Profile a release build using real data.
1- Is that true that I will get that kind of improvement?
Maybe yes, maybe no. It depends on a number of factors, such as
How long it takes to convert from double to int
How big a word is on your machine
What platform/toolset you're using and what optimizations you have enabled
(Maybe) how big a cache line is on your platform
How fast your memory is
How fast your platform computes floating-point versus integer.
And who knows what else. In short, too many complex variables for anyone to be able to say for sure if you will or will not improve performance.
But I would be highly skeptical about your friend's claim, "at least 10 times faster at least."
2- Is it safe to convert everything to integers? And what are the
drawbacks to this?
It depends on what you're converting and how. Obviously converting a value like 123.456 to an integer is decidedly unsafe.
Drawbacks include loss of precision, loss of accuracy, and the expense in terms of space and time to actually do the conversions. Another significant drawback is the fact that you have to write a substantial amount of code, and every line of code you write is a probable source of new bugs.
3- What advice would you have for this whole issue? What would you do?
I would step back & take a deep breath. Profile your code under real-world conditions. Identify the sources of the bottlenecks. Find out what the real problems are, and if there even are any.
Identify inefficiencies in your algorithms, and fix them.
Throw hardware at the problem.
Then you can endeavor to start micro-optimizing. This would be my last resort, especially if the optimization technique you are considering would require writing a lot of code.
First, this reeks of attempting to optimize unnecessarily.
Second, doubles are a minimum of 64-bits. ints on most systems are 32-bits. So you have a couple of choices: truncate the double (which reduces your precision to a single), or store it in the space of 2 integers, or store it as an unsigned long long (which is at least 64-bits as well). For the first 2 options, you are facing a performance hit as you must convert the numbers back and forth between the doubles you are operating on and the integers you are storing it as. For the third option, you are not gaining any performance increase (in terms of memory usage) as they are basically the same size - so you'd just be converting them to integers for no reason.
So, to get to your questions:
1) Doubtful, but you can try it to see for yourself.
2) The problem isn't storage as the bits are just bits when they get into memory. The problem is the arithmetic. Since you stated you need double precision, attempting to do those operations on an integer type will not give you the results you are looking for.
3) Don't optimize until it has been proven something needs to have a performance improvement. And always remember Amdahl's Law: Make the common case fast and the rare case correct.
What I would do is:
First tune it in single-thread mode (by the random-pausing method) until you can't find any way to reduce cycles. The kinds of things I've found are:
a large fraction of time spent in library functions like sin, cos, exp, and log where the arguments were often unchanged, so the answers would be the same. The solution for that is called "memoizing", where you figure out a place to store old values of arguments and results, and check there first before calling the function.
In calling library functions like DGEMM (lapack matrix-multiply) that one would assume are optimized to the teeth, they are actually spending a large fraction of time calling a function to determine if the matrices are upper or lower triangle, square, symmetric, or whatever, rather than actually doing the multiplication. If so, the answer is obvious - write a special routine just for your situation.
Don't say "but I don't have those problems". Of course - you probably have different problems - but the process of finding them is the same.
Once you've made it as fast as possible in single-thread, then figure out how to parallelize it. Multi-threading can have high overhead, so it's best not to tightly-couple the threads.
Regarding your question about converting from doubles to integers, the other answers are right on the money. It only makes sense in very particular situations.

Why would you use float over double, or double over long double?

I'm still a beginner at programming and I always have more questions than our book or internet searches can answer (unless I missed something). So I apologize in advance if this was answered but I couldn't find it.
I understand that float has a smaller range than double making it less precise, and from what I understand, long double is even more precise(?). So my question is why would you want to use a variable that is less precise in the first place? Does it have something to do with different platforms, different OS versions, different compilers? Or are there specific moments in programming where its strategically more advantageous to use a float over a double/long double?
Thanks everyone!
In nearly all processors, "smaller" floating point numbers take the same or less clock-cycles in execution. Sometimes the difference isn't very big (or nothing), other times it can be literally twice the number of cycles for double vs. float.
Of course, memory foot-print, which is affecting cache-usage, will also be a factor. float takes half the size of double, and long double is bigger yet.
Edit: Another side-effect of smaller size is that the processor's SIMD extensions (3DNow!, SSE, AVX in x86, and similar extensions are available in several other architectures) may either only work with float, or can take twice as many float vs. double (and as far as I know, no SIMD instructions are available for long double in any processor). So this may improve performance if float is used vs. double, by processing twice as much data in one go. End edit.
So, assuming 6-7 digits of precision is good enough for what you need, and the range of +/-10+/-38 is sufficient, then float should be used. If you need either more digits in the number, or a bigger range, move to double, and if that's not good enough, use long double. But for most things, double should be perfectly adequate.
Obviously, the importance of using "the right size" becomes more important when you have either lots of calculations, or lots of data to work with - if there are 5 variables, and you just use each a couple of times in a program that does a million other things, who cares? If you are doing fluid dynamics calculations for how well a Formula 1 car is doing at 200 mph, then you probably have several tens of million datapoints to calculate, and every data point needs to be calculated dozens of times per second of the cars travel, then using up just a few clockcycles extra in each calculation will make the whole simulation take noticeably longer.
There are two costs to using float, the obvious one of its limited range and precision, and, less obviously, the more difficult analysis those limitations impose.
It is often relatively easy to determine that double is sufficient, even in cases where it would take significant numerical analysis effort to show that float is sufficient. That saves development cost, and risk of incorrect results if the more difficult analysis is not done correctly.
Float's biggest advantage on many processors is its reduced memory footprint. That translates into more numbers per cache line, and more memory bandwidth in terms of numbers transferred per second. Any gain in compute performance is usually relatively slight - indeed, popular processors do all floating point arithmetic in one format that is wider than double.
It seems best to use double unless two conditions are met - there are enough numbers for their memory footprint to be a significant performance issue, and the developers can show that float is precise enough.
You might be interested in seeing the answer posted here Should I use double or float?
But it boils down to memory footprint vs the amount of precision you need for a given situation. In a physics engine, you might care more about precision, so it would make more sense to use a double or long double.
Bottom line:
You should only use as much precision as you need for a given algorithm
The basic principle here would be don't use more than you need.
The first consideration is memory use, you probably realized that already, if you are making only one double no big deal, but what if you create a billion than you just used twice as much memory space as you had too.
Next is processor utilization, I believe on many processors if you use smaller data types it can do a form of threading where it does multiple operations at once.
So an extension to this part of the answer is SSE instructions basically this allows you to used packed data to do multiple floating point operations at once, which in an idealized case can double the speed of your program.
Lastly is readability, when someone is reading your code if you use a float they will immediately realize that you are not going over a certain number. IMO sometimes the right precision number will just flow better in the code.
A float uses less memory than a double, so if you don't need your number to be the size of a double, you might as well use a float since it will take up less memory.
Just like you wouldn't use a bus to drive yourself and a friend to the beach... you would be far better off going in a 2 seater car.
The same applies for a double over a long double... only reserve as much memory as you are going to need. Otherwise with more complex code you run the risk of using too much memory and having processes slow down or crash.

Double or float - optimization routines

I am reading through code for optimization routines (Nelder Mead, SQP...). Languages are C++, Python. I observe that often conversion from double to float is performed, or methods are duplicated with double resp. float arguments. Why is it profitable in optimization routines code, and is it significant? In my own code in C++, should I be careful for types double and float and why?
Kind regards.
Often the choice between double and float is made more on space demands than speed. Modern processors are capable of operating on double quite fast.
Floats may be faster than doubles when using SIMD instructions (such as SSE) which can operate on multiple values at a time. Also if the operations are faster than the memory pipeline, the smaller memory requirements of float will speed things overall.
Other times that I've come across the need to consider the choice between double and float types in terms of optimisation include:
Networking. Sending double precision data across a socket connection
will obviously require more time than sending half that amount of
data.
Mobile and embedded processors may only be able to handle high
speed single precision calculations efficiently on a coprocessor.
As mentioned in another answer, modern desktop processors can handle double precision Processing quite fast. However, you have to ask yourself if the
double precision processing is really required. I work with audio,
and the only time that I can think of where I would need to process
double precision data would be when using high order filters where
numerical errors can accumulate. Most of the time this can be avoided
by paying more careful attention to the algorithm design. There are,
of course, other scientific or engineering applications where double
precision data is required in order to correctly represent a huge
dynamic range.
Even so, the question of how much effort to spend on considering the data type to use really depends on your target platform. If the platform can crunch through doubles with negligible overhead and you have memory to spare then there is no need to concern yourself. Profile small sections of test code to find out.
In certain optimization algorithms, the choice between double and float is not made more on space demands than speed. For example, with penalty or barrier methods that are used for interior point methods in nonlinear optimization, a float has insufficient precision compared to a double, and using floats in the algorithm will yield garbage. For this reason, penalty and barrier methods were not used in the 1960s, but were rediscovered later on with the advent of the double precision data type. (For more on these methods, consult Nonlinear Programming: Sequential Unconstrained Minimization Techniques (Classics in Applied Mathematics) by Fiacco and McCormick.)
Another consideration is the conditioning of the underlying linear systems solved in many optimization algorithms. If the linear systems you're solving in something like a Newton iteration are sufficiently ill-conditioned, you will not be able to obtain an accurate solution to those systems.
Only if the loss in precision will not jeopardize your numerics should you consider replacing doubles with floats; even if space constraints force you to do so, you should make sure that the accuracy of your numerical results is not compromised. Once sufficient accuracy is assured for the problems you're working on, you can then worry about space and performance optimizations. You can use the CUTEr test set to validate your optimization routines.

x86 4byte floats vs. 8byte doubles (vs. long long)?

We have a measurement data processing application and currently all data is held as C++ float which means 32bit/4byte on our x86/Windows platform. (32bit Windows Application).
Since precision is becoming an issue, there have been discussions to move to another datatype. The options currently discussed are switching to double (8byte) or implementing a fixed decimal type on top of __int64 (8byte).
The reason the fixed-decimal solution using __int64 as underlying type is even discussed is that someone claimed that double performance is (still) significantly worse than processing floats and that we might see significant performance benefits using a native integer type to store our numbers. (Note that we really would be fine with fixed decimal precision, although the code would obviously become more complex.)
Obviously we need to benchmark in the end, but I would like to ask whether the statement that doubles are worse holds any truth looking at modern processors? I guess for large arrays doubles may mess up cache hits more that floats, but otherwise I really fail to see how they could differ in performance?
It depends on what you do. Additions, subtractions and multiplies on double are just as fast as on float on current x86 and POWER architecture processors. Divisions, square roots and transcendental functions (exp, log, sin, cos, etc.) are usually notably slower with double arguments, since their runtime is dependent on the desired accuracy.
If you go fixed point, multiplies and divisions need to be implemented with long integer multiply / divide instructions which are usually slower than arithmetic on doubles (since processors aren't optimized as much for it). Even more so if you're running in 32 bit mode where a long 64 bit multiply with 128 bit results needs to be synthesized from several 32-bit long multiplies!
Cache utilization is a red herring here. 64-bit integers and doubles are the same size - if you need more than 32 bits, you're gonna eat that penalty no matter what.
Look it up. Both and Intel publish the instruction latencies for their CPUs in freely available PDF documents on their websites.
However, for the most part, performance won't be significantly different, or a couple of reasons:
when using the x87 FPU instead of SSE, all floating point operations are calculated at 80 bits precision internally, and then rounded off, which means that the actual computation is equally expensive for all floating-point types. The only cost is really memory-related then (in terms of CPU cache and memory bandwidth usage, and that's only an issue in float vs double, but irrelevant if you're comparing to int64)
with or without SSE, nearly all floating-point operations are pipelined. When using SSE, the double instructions may (I haven't looked this up) have a higher latency than their float equivalents, but the throughput is the same, so it should be possible to achieve similar performance with doubles.
It's also not a given that a fixed-point datatype would actually be faster either. It might, but the overhead of keeping this datatype consistent after some operations might outweigh the savings. Floating-point operations are fairly cheap on a modern CPU. They have a bit of latency, but as mentioned before, they're generally pipelined, potentially hiding this cost.
So my advice:
Write some quick tests. It shouldn't be that hard to write a program that performs a number of floating-point ops, and then measure how much slower the double version is relative to the float one.
Look it up in the manuals, and see for yourself if there's any significant performance difference between float and double computations
I've trouble the understand the rationale "as double as slower than float we'll use 64 bits int". Guessing performance has always been an black art needing much of experience, on today hardware it is even worse considering the number of factors to take into account. Even measuring is difficult. I know of several cases where micro-benchmarks lent to one solution but in context measurement showed that another was better.
First note that two of the factors which have been given to explain the claimed slower double performance than float are not pertinent here: bandwidth needed will the be same for double as for 64 bits int and SSE2 vectorization would give an advantage to double...
Then consider than using integer computation will increase the pressure on the integer registers and computation units when apparently the floating point one will stay still. (I've already seen cases where doing integer computation in double was a win attributed to the added computation units available)
So I doubt that rolling your own fixed point arithmetic would be advantageous over using double (but I could be showed wrong by measures).
Implementing 64 fixed points isn't really fun. Especially for more complex functions like Sqrt or logarithm. Integers will probably still a bit faster for simple operations like additions. And you'll need to deal with integer overflows. And you need to be careful when implementing rounding, else errors can easily accumulate.
We're implementing fixed points in a C# project because we need determinism which floatingpoint on .net doesn't guarantee. And it's relatively painful. Some formula contained x^3 bang int overflow. Unless you have really compelling reasons not to, use float or double instead of fixedpoint.
SIMD instructions from SSE2 complicate the comparison further, since they allow operation on several floating point numbers(4 floats or 2 doubles) at the same time. I'd use double and try to take advantage of these instructions. So double will probably be significantly slower than floats, but comparing with ints is difficult and I'd prefer float/double over fixedpoint is most scenarios.
It's always best to measure instead of guess. Yes, on many architectures, calculations on doubles process twice the data as calculations on floats (and long doubles are slower still). However, as other answers, and comments on this answer, have pointed out, the x86 architecture doesn't follow the same rules as, say, ARM processors, SPARC processors, etc. On x86 floats, doubles and long doubles are all converted to long doubles for computation. I should have known this, because the conversion causes x86 results to be more accurate than SPARC and Sun went through a lot of trouble to get the less accurate results for Java, sparking some debate (note, that page is from 1998, things have since changed).
Additionally, calculations on doubles are built in to the CPU where calculations on a fixed decimal datatype would be written in software and potentially slower.
You should be able to find a decent fixed sized decimal library and compare.
With various SIMD instruction sets you can perform 4 single precision floating point operations at the same cost as one, essentially you pack 4 floats into a single 128 bit register. When switching to doubles you can only pack 2 doubles into these registers and hence you can only do two operations at the same time.
As many people have said, a 64bit int is probably not worth it if double is an option. At least when SSE is available. This might be different on micro controllers of various kinds but I guess that is not your application. If you need additional precision in long sums of floats, you should keep in mind that this operation is sometimes problematic with floats and doubles and would be more exact on integers.

Should I use double or float?

What are the advantages and disadvantages of using one instead of the other in C++?
If you want to know the true answer, you should read What Every Computer Scientist Should Know About Floating-Point Arithmetic.
In short, although double allows for higher precision in its representation, for certain calculations it would produce larger errors. The "right" choice is: use as much precision as you need but not more and choose the right algorithm.
Many compilers do extended floating point math in "non-strict" mode anyway (i.e. use a wider floating point type available in hardware, e.g. 80-bits and 128-bits floating), this should be taken into account as well. In practice, you can hardly see any difference in speed -- they are natives to hardware anyway.
Unless you have some specific reason to do otherwise, use double.
Perhaps surprisingly, it is double and not float that is the "normal" floating-point type in C (and C++). The standard math functions such as sin and log take doubles as arguments, and return doubles. A normal floating-point literal, as when you write 3.14 in your program, has the type double. Not float.
On typical modern computers, doubles can be just as fast as floats, or even faster, so performance is usually not a factor to consider, even for large calculations. (And those would have to be large calculations, or performance shouldn't even enter your mind. My new i7 desktop computer can do six billion multiplications of doubles in one second.)
This question is impossible to answer since there is no context to the question. Here are some things that can affect the choice:
Compiler implementation of floats, doubles and long doubles. The C++ standard states:
There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.
So, all three can be the same size in memory.
Presence of an FPU. Not all CPUs have FPUs and sometimes the floating point types are emulated and sometimes the floating point types are just not supported.
FPU Architecture. The IA32's FPU is 80bit internally - 32 bit and 64 bit floats are expanded to 80bit on load and reduced on store. There's also SIMD which can do four 32bit floats or two 64bit floats in parallel. Use of SIMD is not defined in the standard so it would require a compiler that does more complex analysis to determine if SIMD can be used, or requires the use of special functions (libraries or intrinsics). The upshot of the 80bit internal format is that you can get slightly different results depending on how often the data is saved to RAM (thus, losing precision). For this reason, compilers don't optimise floating point code particularly well.
Memory bandwidth. If a double requires more storage than a float, then it will take longer to read the data. That's the naive answer. On a modern IA32, it all depends on where the data is coming from. If it's in L1 cache, the load is negligible provided the data comes from a single cache line. If it spans more than one cache line there's a small overhead. If it's from L2, it takes a while longer, if it's in RAM then it's longer still and finally, if it's on disk it's a huge time. So the choice of float or double is less important than the way the data is used. If you want to do a small calculation on lots of sequential data, a small data type is preferable. Doing a lot of computation on a small data set would allow you to use bigger data types with any significant effect. If you're accessing the data very randomly, then the choice of data size is unimportant - data is loaded in pages / cache lines. So even if you only want a byte from RAM, you could get 32 bytes transferred (this is very dependant on the architecture of the system). On top of all of this, the CPU/FPU could be super-scalar (aka pipelined). So, even though a load may take several cycles, the CPU/FPU could be busy doing something else (a multiply for instance) that hides the load time to a degree.
The standard does not enforce any particular format for floating point values.
If you have a specification, then that will guide you to the optimal choice. Otherwise, it's down to experience as to what to use.
Double is more precise but is coded on 8 bytes. float is only 4 bytes, so less room and less precision.
You should be very careful if you have double and float in your application. I had a bug due to that in the past. One part of the code was using float while the rest of the code was using double. Copying double to float and then float to double can cause precision error that can have big impact. In my case, it was a chemical factory... hopefully it didn't have dramatic consequences :)
I think that it is because of this kind of bug that the Ariane 6 rocket has exploded a few years ago!!!
Think carefully about the type to be used for a variable
I personnaly go for double all the time until I see some bottlenecks. Then I consider moving to float or optimizing some other part
This depends on how the compiler implements double. It's legal for double and float to be the same type (and it is on some systems).
That being said, if they are indeed different, the main issue is precision. A double has a much higher precision due to it's difference in size. If the numbers you are using will commonly exceed the value of a float, then use a double.
Several other people have mentioned performance isssues. That would be exactly last on my list of considerations. Correctness should be your #1 consideration.
I think regardless of the differences (which as everyone points out, floats take up less space and are in general faster)... does anyone ever suffer performance issues using double? I say use double... and if later on you decide "wow, this is really slow"... find your performance bottleneck (which is probably not the fact you used double). THEN, if it's still too slow for you, see where you can sacrifice some precision and use float.
Use whichever precision is required to achieve the appropriate results. If you then find that your code isn't performing as well as you'd like (you used profiling correct?) take a look at:
Intel 64 and IA-32 Architectures Optimization Reference Manual
Software Optimization for the AMD64 Processor
It depends highly on the CPU the most obvious trade-offs are between precision and memory. With GBs of RAM, memory is not much of an issue, so it's generally better to use doubles.
As for performance, it depends highly on the CPU. floats will usually get better performance than doubles on a 32 bit machine. On 64 bit, doubles are sometimes faster, since it is (usually) the native size. Still, what will matter much more than your choice of data types is whether or not you can take advantage of SIMD instructions on your processor.
double has higher precision, whereas floats take up less memory and are faster. In general you should use float unless you have a case where it isn't accurate enough.
The main difference between float and double is precision. Wikipedia has more info about
Single precision (float) and Double precision.