I'm going to use big numbers in C++ code on an embedded system. Luckily the compiler recognizes long doubles.
I can not use standard libraries, boost libraries, gnu math libraries, etc. And the system has not got built-in float math cpu.
Now how can I detect long double overflows?
Your state that you need "big numbers", but this does not necessarily mean that the use of long double is indicated. In most embedded applications that I know of, long double is chosen for its enhanced precision, i.e. more bits of resolution for fractional numbers, than for its increased range.
You also state your implementation offers little of the usual floating point libraries and/or functionality. Based on these statements, I would question whether your need fully functional floating-point capabilities. If your concerns are limited to "big numbers", check to see if your compiler offers a long long datatype, which is a 64-bit integer.
If you do need some floating-point capability, you might consider a fixed-point implementation. Assuming a long long, you might choose to represent numbers in a 48.16 format, which will permit numbers of ~2.8x10^14 with 16 bits to the right of decimal. (If you need an introduction to fixed-point computation, start here.)
Having addressed some of the background issues, let's look at the original question. If you wish to detect overflow in an unsigned int (which I commonly do in my embedded work), it's sufficient to compare your latest result with the previous one. For example, my application requires me to periodically inspect a 16-bit counter that is driven by an external clock. If my current observation is less than the last observation, then I can assume that the 16-bit counter overflowed, and I can take action accordingly. If you implement your big numbers using a long long integer datatype, you can apply a similar strategy to detect overflow.
As it's not standard C++, you will have to rely on methods provided by your specific environment. The manufacturer of the embedded system should have documented how it can be done. Ask him.
Related
In the stdint.h (C99), boost/cstdint.hpp, and cstdint (C++0x) headers there is, among others, the type int32_t.
Are there similar fixed-size floating point types? Something like float32_t?
Nothing like this exists in the C or C++ standards at present. In fact, there isn't even a guarantee that float will be a binary floating-point format at all.
Some compilers guarantee that the float type will be the IEEE-754 32 bit binary format. Some do not. In reality, float is in fact the IEEE-754 single type on most non-embedded platforms, though the usual caveats about some compilers evaluating expressions in a wider format apply.
There is a working group discussing adding C language bindings for the 2008 revision of IEEE-754, which could consider recommending that such a typedef be added. If this were added to C, I expect the C++ standard would follow suit... eventually.
If you want to know whether your float is the IEEE 32-bit type, check std::numeric_limits<float>::is_iec559. It's a compile-time constant, not a function.
If you want to be more bulletproof, also check std::numeric_limits<float>::digits to make sure they aren't sneakily using the IEEE standard double-precision for float. It should be 24.
When it comes to long double, it's more important to check digits because there are a couple IEEE formats which it might reasonably be: 128 bits (digits = 113) or 80 bits (digits = 64).
It wouldn't be practical to have float32_t as such because you usually want to use floating-point hardware, if available, and not to fall back on a software implementation.
If you think having typedefs such as float32_t and float64_t are impractical for any reasons, you must be too accustomed to your familiar OS, compiler, that you are unable too look outside your little nest.
There exist hardware which natively runs 32-bit IEEE floating point operations and others that do 64-bit. Sometimes such systems even have to talk to eachother, in which case it is extremely important to know if a double is 32 bit or 64 bit on each platform. If the 32-bit platform were to do excessive calculations on base on the 64-bit values from the other, we may want to cast to the lower precision depending on timing and speed requirements.
I personally feel uncomfortable using floats and doubles unless I know exactly how many bits they are on my platfrom. Even more so if I am to transfer these to another platform over some communications channel.
There is currently a proposal to add the following types into the language:
decimal32
decimal64
decimal128
which may one day be accessible through #include <decimal>.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3871.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In most of the code I see around, double is favourite against float, even when a high precision is not needed.
Since there are performance penalties when using double types (CPU/GPU/memory/bus/cache/...), what is the reason of this double overuse?
Example: in computational fluid dynamics all the software I worked with uses doubles. In this case a high precision is useless (because of the errors due to the approximations in the mathematical model), and there is a huge amount of data to be moved around, which could be cut in half using floats.
The fact that today's computers are powerful is meaningless, because they are used to solve more and more complex problems.
Among others:
The savings are hardly ever worth it (number-crunching is not typical).
Rounding errors accumulate, so better go to higher precision than needed from the start (experts may know it is precise enough anyway, and there are calculations which can be done exactly).
Common floating operations using the fpu internally often work on double or higher precision anyway.
C and C++ can implicitly convert from float to double, the other way needs an explicit cast.
Variadic and no-prototype functions always get double, not float. (second one is only in ancient C and actively discouraged)
You may commonly do an operation with more than needed precision, but seldom with less, so libraries generally favor higher precision too.
But in the end, YMMV: Measure, test, and decide for yourself and your specific situation.
BTW: There's even more for performance fanatics: Use the IEEE half precision type. Little hardware or compiler support for it exists, but it cuts your bandwidth requirements in half yet again.
In my opinion the answers so far don't really get the right point across, so here's my crack at it.
The short answer is C++ developers use doubles over floats:
To avoid premature optimization when they don't understand the performance trade-offs well ("they have higher precision, why not?" Is the thought process)
Habit
Culture
To match library function signatures
To match simple-to-write floating point literals (you can write 0.0 instead of 0.0f)
It's true double may be as fast as a float for a single computation because most FPUs have a wider internal representation than either the 32-bit float or 64-bit double represent.
However that's only a small piece of the picture. Now-days operational optimizations don't mean anything if you're bottle necked on cache/memory bandwidth.
Here is why some developers seeking to optimize their code should look into using 32-bit floats over 64-bit doubles:
They fit in half the memory. Which is like having all your caches be twice as large. (big win!!!)
If you really care about performance you'll use SSE instructions. SSE instructions that operate on floating point values have different instructions for 32-bit and 64-bit floating point representations. The 32-bit versions can fit 4 values in the 128-bit register operands, but the 64-bit versions can only fit 2 values. In this scenario you can likely double your FLOPS by using floats over double because each instruction operates on twice as much data.
In general, there is a real lack of knowledge of how floating point numbers really work in the majority of developers I've encountered. So I'm not really surprised most developers blindly use double.
double is, in some ways, the "natural" floating point type in the C language, which also influences C++. Consider that:
an unadorned, ordinary floating-point constant like 13.9 has type double. To make it float, we have to add an extra suffix f or F.
default argument promotion in C converts float function arguments* to double: this takes place when no declaration exists for an argument, such as when a function is declared as variadic (e.g. printf) or no declaration exists (old style C, not permitted in C++).
The %f conversion specifier of printf takes a double argument, not float. There is no dedicated way to print float-s; a float argument default-promotes to double and so matches %f.
On modern hardware, float and double are usually mapped, respectively, to 32 bit and 64 bit IEEE 754 types. The hardware works with the 64 bit values "natively": the floating-point registers are 64 bits wide, and the operations are built around the more precise type (or internally may be even more precise than that). Since double is mapped to that type, it is the "natural" floating-point type.
The precision of float is poor for any serious numerical work, and the reduced range could be a problem also. The IEEE 32 bit type has only 23 bits of mantissa (8 bits are consumed by the exponent field and one bit for the sign). The float type is useful for saving storage in large arrays of floating-point values provided that the loss of precision and range isn't a problem in the given application. For example, 32 bit floating-point values are sometimes used in audio for representing samples.
It is true that the use of a 64 bit type over 32 bit type doubles the raw memory bandwidth. However, that only affects programs which with a large arrays of data, which are accessed in a pattern that shows poor locality. The superior precision of the 64 bit floating-point type trumps issues of optimization. Quality of numerical results is more important than shaving cycles off the running time, in accordance with the principle of "get it right first, then make it fast".
* Note, however, that there is no general automatic promotion from float expressions to double; the only promotion of that kind is integral promotion: char, short and bitfields going to int.
This is mostly hardware dependent, but consider that the most common CPU (x86/x87 based) have internal FPU that operate on 80bits floating point precision (which exceeds both floats and doubles).
If you have to store in memory some intermediate calculations, double is the good average from internal precision and external space. Performance is more or less the same, on single values. It may be affected by the memory bandwidth on large numeric pipes (since they will have double length).
Consider that floats have a precision that approximate 6 decimal digits. On a N-cubed complexity problem (like a matrix inversion or transformation), you lose two or three more in mul and div, remaining with just 3 meaningful digits. On a 1920 pixel wide display they are simply not enough (you need at least 5 to match a pixel properly).
This roughly makes double to be preferable.
It is often relatively easy to determine that double is sufficient, even in cases where it would take significant numerical analysis effort to show that float is sufficient. That saves development cost, and the risk of incorrect results if the analysis is not done correctly.
Also any performance gain by using float is usually relatively slighter than using double,that is because most of the popular processors do all floating point arithmetic in one format that is even wider than double.
I think higher precision is the only reason. Actually most people don't think a lot about it, they just use double.
I think if float precision is good enough for particular task there is no reason to use double.
I need to program a fixed point processor that was used for a legacy application. New features are requested and these features need large dynamic range , most likely beyond the fixed point range even after scaling. As the processor will not be changed due to several reasons, I am planning to incorporate the floating point operation based on fixed point arithmetic-- basically software based approach. I want to define few data structures to represent a floating point numbers in C for the underlying fixed point processor. Is it possible to do at all? I am planning to use the IEEE floating point representation . What kind of data structures would be good for achieving basic operation like multiplication, division, add and sub . Are there already some open source libraries available in C /C++?
Most C development tools for microcontrollers without native floating-point support provide software libraries for floating-point operations. Even if you do your programming in assembler, you could probably use those libraries. (Just curious, which processor and which development tools are you using?)
However, if you are serious about writing your own floating-point libraries, the most efficient way is to treat the routines as routines operating on integers. You can use unions, or code like to following to convert between floating-point and integer representation.
uint32_t integer_representation = *(uint32_t *)&fvalue;
Note that this is inherently undefined behavior, as the format of a floating-point number is not specified in the C standard.
The problem is much easier if you stick to floating-point and integer types that match (typically 32 or 64 bit types), that way you can see the routines as plain integer routines, as, for example, addition takes two 32 bit integer representations of floating-point values and return a 32 bit integer representation of the result.
Also, if you use the routines yourself, you could probably get away with leaving parts of the standard unimplemented, such as exception flags, subnormal numbers, NaN and Inf etc.
You don’t really need any data structures to do this. You simply use integers to represent floating-point encodings and integers to represent unpacked sign, exponent, and significand fields.
I've heard that there are many problems with floats/doubles on different CPU's.
If i want to make a game that uses floats for everything, how can i be sure the float calculations are exactly the same on every machine so that my simulation will look exactly same on every machine?
I am also concerned about writing/reading files or sending/receiving the float values to different computers. What conversions there must be done, if any?
I need to be 100% sure that my float values are computed exactly the same, because even a slight difference in the calculations will result in a totally different future. Is this even possible ?
Standard C++ does not prescribe any details about floating point types other than range constraints, and possibly that some of the maths functions (like sine and exponential) have to be correct up to a certain level of accuracy.
Other than that, at that level of generality, there's really nothing else you can rely on!
That said, it is quite possible that you will not actually require binarily identical computations on every platform, and that the precision and accuracy guarantees of the float or double types will in fact be sufficient for simulation purposes.
Note that you cannot even produce a reliable result of an algebraic expression inside your own program when you modify the order of evaluation of subexpressions, so asking for the sort of reproducibility that you want may be a bit unrealistic anyway. If you need real floating point precision and accuracy guarantees, you might be better off with an arbitrary precision library with correct rounding, like MPFR - but that seems unrealistic for a game.
Serializing floats is an entirely different story, and you'll have to have some idea of the representations used by your target platforms. If all platforms were in fact to use IEEE 754 floats of 32 or 64 bit size, you could probably just exchange the binary representation directly (modulo endianness). If you have other platforms, you'll have to think up your own serialization scheme.
What every programmer should know: http://docs.sun.com/source/806-3568/ncg_goldberg.html
Say you're writing a C++ application doing lots of floating point arithmetic. Say this application needs to be portable accross a reasonable range of hardware and OS platforms (say 32 and 64 bits hardware, Windows and Linux both in 32 and 64 bits flavors...).
How would you make sure that your floating point arithmetic is the same on all platforms ? For instance, how to be sure that a 32 bits floating point value will really be 32 bits on all platforms ?
For integers we have stdint.h but there doesn't seem to exist a floating point equivalent.
[EDIT]
I got very interesting answers but I'd like to add some precision to the question.
For integers, I can write:
#include <stdint>
[...]
int32_t myInt;
and be sure that whatever the (C99 compatible) platform I'm on, myInt is a 32 bits integer.
If I write:
double myDouble;
float myFloat;
am I certain that this will compile to, respectively, 64 bits and 32 bits floating point numbers on all platforms ?
Non-IEEE 754
Generally, you cannot. There's always a trade-off between consistency and performance, and C++ hands that to you.
For platforms that don't have floating point operations (like embedded and signal processing processors), you cannot use C++ "native" floating point operations, at least not portably so. While a software layer would be possible, that's certainly not feasible for this type of devices.
For these, you could use 16 bit or 32 bit fixed point arithmetic (but you might even discover that long is supported only rudimentary - and frequently, div is very expensive). However, this will be much slower than built-in fixed-point arithmetic, and becomes painful after the basic four operations.
I haven't come across devices that support floating point in a different format than IEEE 754. From my experience, your best bet is to hope for the standard, because otherwise you usually end up building algorithms and code around the capabilities of the device. When sin(x) suddenly costs 1000 times as much, you better pick an algorithm that doesn't need it.
IEEE 754 - Consistency
The only non-portability I found here is when you expect bit-identical results across platforms. The biggest influence is the optimizer. Again, you can trade accuracy and speed for consistency. Most compilers have a option for that - e.g. "floating point consistency" in Visual C++. But note that this is always accuracy beyond the guarantees of the standard.
Why results become inconsistent?
First, FPU registers often have higher resolution than double's (e.g. 80 bit), so as long as the code generator doesn't store the value back, intermediate values are held with higher accuracy.
Second, the equivalences like a*(b+c) = a*b + a*c are not exact due to the limited precision. Nonetheless the optimizer, if allowed, may make use of them.
Also - what I learned the hard way - printing and parsing functions are not necessarily consistent across platforms, probably due to numeric inaccuracies, too.
float
It is a common misconception that float operations are intrinsically faster than double. working on large float arrays is faster usually through less cache misses alone.
Be careful with float accuracy. it can be "good enough" for a long time, but I've often seen it fail faster than expected. Float-based FFT's can be much faster due to SIMD support, but generate notable artefacts quite early for audio processing.
Use fixed point.
However, if you want to approach the realm of possibly making portable floating point operations, you at least need to use controlfp to ensure consistent FPU behavior as well as ensuring that the compiler enforces ANSI conformance with respect to floating point operations. Why ANSI? Because it's a standard.
And even then you aren't guaranteeing that you can generate identical floating point behavior; that also depends on the CPU/FPU you are running on.
It shouldn't be an issue, IEEE 754 already defines all details of the layout of floats.
The maximum and minimum values storable should be defined in limits.h
Portable is one thing, generating consistent results on different platforms is another. Depending on what you are trying to do then writing portable code shouldn't be too difficult, but getting consistent results on ANY platform is practically impossible.
I believe "limits.h" will include the C library constants INT_MAX and its brethren. However, it is preferable to use "limits" and the classes it defines:
std::numeric_limits<float>, std::numeric_limits<double>, std::numberic_limits<int>, etc...
If you're assuming that you will get the same results on another system, read What could cause a deterministic process to generate floating point errors first. You might be surprised to learn that your floating point arithmetic isn't even the same across different runs on the very same machine!