I have been in the process of writing a FORTRAN code for numerical simulations of an applied physics problem for more than two years and I've tried to follow the conventions described in Fortran Best Practices.
More specifically, I defined a parameter as
integer, parameter:: dp=kind(0.d0)
and then used it for all doubles in my code.
However, I found out (on this forum) that using KIND parameters not necessarily gives you the same precision if you compile your code using other compilers. In this question, I read that a possible solution is using the SELECTED_REAL_KIND and SELECTED_INT_KIND, which follow some convention, as far as I understand.
Later on, though, I found out about the ISO_FORTRAN_ENV module which defines the REAL32, REAL64 and REAL128 KIND parameters.
I guess that these are indeed portable and, since they belong to the FORTRAN 2008 standard (though supported by GNU), I guess that I should use these?
Therefore, I would greatly appreciate if someone with more knowledge and experience clear up the confusion.
Also, I have a follow-up question about using these KINDs, in HDF5. I was using H5T_NATIVE_DOUBLE and it was indeed working fine (as far as I know) However, in this document it is stated that this is now an obsolete feature and should not be used. In stead, they provide a function
INTEGER(HID_T) FUNCTION h5kind_to_type(kind, flag) RESULT(h5_type) .
When I use it, and print out the exact numerical value of the HID_T integer corresponding to REAL64 gives me 50331972, whereas H5T_NATIVE_DOUBLE gives me 50331963, which is different.
If I then try to use the value calculated by H5kind_to_type, the HDF5 library runs just as fine and, using XDMF, I can plot the output in VisIt or Paraview without modifying the accompanying .xmf file.
So my second question would be (again): Is this correct usage?
The type double precision and the corresponding kind kind(1.d0) are perfectly well defined by the standard. But they are also not exactly fixed. Indeed there were many computers in history and use different kind of native formats for their floating point numbers and the standard must allow this!
So, a double precision is a kind of real which has a higher enough precision than the default real. The default real is also not fixed, it must correspond to what the computers can use.
Now today we have the standard for floating point numbers IEEE_754 which defines IEEE single (binary32) and IEEE double types (binary64) and some others. If the computer hardware implements this standard, as almost all computers younger than 20 years do, it is very likely that the compiler chooses these two as the real and double precision.
The Fortran 2008 standard brings the two kind constants real32 and real64 (and others). They enable you to request the real kinds which have storage size of 32 and 64 bits. It is not guaranteed it will be the IEEE types, but it is almost certain on modern computers.
To request the IEEE types (if they are available) use the intrinsic function ieee_selected_real_kind() from module ieee_arithmetic.
The IEEE types are the same an all computers (excluding endianness!), but the compiler is not required to support them, because you may have a computer which does not support these in hardware. This is only a theoretical possibility, all modern computers support them.
Now to your HDF constants, these are apparently just some indexes to some table, it does not matter if they are different or not, the important is whether they mean the same and in your case they do.
As I wrote above, it is extremely likely that on a computer which supports IEEE 754 the double precision will be identical to IEEE double. It may not be, if you use some compiler options which change this behaviour. There are compiler options which promote the default real to double and hey may also promote double precision to quad precision (128 bit) to preserve the standard semantics which requires the double precision to have more precision and storage size.
Conclusion: You can use both, or any other way to choose your kind constants (you can also use iso_c_binding's c_float and c_double), but you should be aware why are those ways different and what do they actually mean.
Related
In the stdint.h (C99), boost/cstdint.hpp, and cstdint (C++0x) headers there is, among others, the type int32_t.
Are there similar fixed-size floating point types? Something like float32_t?
Nothing like this exists in the C or C++ standards at present. In fact, there isn't even a guarantee that float will be a binary floating-point format at all.
Some compilers guarantee that the float type will be the IEEE-754 32 bit binary format. Some do not. In reality, float is in fact the IEEE-754 single type on most non-embedded platforms, though the usual caveats about some compilers evaluating expressions in a wider format apply.
There is a working group discussing adding C language bindings for the 2008 revision of IEEE-754, which could consider recommending that such a typedef be added. If this were added to C, I expect the C++ standard would follow suit... eventually.
If you want to know whether your float is the IEEE 32-bit type, check std::numeric_limits<float>::is_iec559. It's a compile-time constant, not a function.
If you want to be more bulletproof, also check std::numeric_limits<float>::digits to make sure they aren't sneakily using the IEEE standard double-precision for float. It should be 24.
When it comes to long double, it's more important to check digits because there are a couple IEEE formats which it might reasonably be: 128 bits (digits = 113) or 80 bits (digits = 64).
It wouldn't be practical to have float32_t as such because you usually want to use floating-point hardware, if available, and not to fall back on a software implementation.
If you think having typedefs such as float32_t and float64_t are impractical for any reasons, you must be too accustomed to your familiar OS, compiler, that you are unable too look outside your little nest.
There exist hardware which natively runs 32-bit IEEE floating point operations and others that do 64-bit. Sometimes such systems even have to talk to eachother, in which case it is extremely important to know if a double is 32 bit or 64 bit on each platform. If the 32-bit platform were to do excessive calculations on base on the 64-bit values from the other, we may want to cast to the lower precision depending on timing and speed requirements.
I personally feel uncomfortable using floats and doubles unless I know exactly how many bits they are on my platfrom. Even more so if I am to transfer these to another platform over some communications channel.
There is currently a proposal to add the following types into the language:
decimal32
decimal64
decimal128
which may one day be accessible through #include <decimal>.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3871.html
I am writing a marshaling layer to automatically convert values between different domains. When it comes to floating point values this potentially means converting values from one floating point format to another. However, it seems that almost every modern system is using IEEE754, so I'm wondering whether it's actually worth generalising to allow other formats, or just manage marshaling between different IEEE754 formats.
Does anyone know of any commonly used floating point formats other than IEEE754 that I should consider (perhaps on ARM processors or mainframes)? If so, a reference to the format specification would be extremely helpful.
Virtually all relatively modern (within the last 15 years) general purpose computers use IEEE 754. In the very unlikely event that you find system that you need to support which uses a non-IEEE 754 floating point format, there will probably be a library available to convert to/from IEEE 754.
Some non-ancient systems which did not natively use IEEE 754 were the Cray SV1 (1998-2003) and IBM System 360, 370, and 390 prior to Generation 5 (ended 2002). IBM implemented IEEE 754 emulation around 2001 in a software release for prior S/390 hardware.
As of now, what systems do you actually want this to work on? If you come across one down the line that doesn't use IEEE754 (which as #JohnZwinick says, is vanishingly unlikely) then you should be able to code for that then.
To put it another way, what you are designing here is, in effect, a communications protocol and you obviously seek to make a sensible choice for how you will represent a floating point number (both single precision and double precision, I guess) in the bytes that travel between domains.
I think #SomeProgrammerDude was trying to imply that representing these as text strings (while they are in transit) might offer the most portability, and if so I would agree, but it's obviously not the most efficient way to do it.
So, if you do decide to plump for IEEE754 as your interchange format (as I would) then the worst that can happen is that you might need to find a way to convert these to and from the native format used on some antique architecture that you are almost certainly never going to encounter, and if that does happen then that problem would not be not difficult to solve.
Also, floats and doubles can be big-endian or little-endian, so you need to decide what you're going to use in your byte stream and convert when marshalling if necessary. Little-endian is much more common these days so I'd go with that.
Does anyone know of any commonly used floating point formats other than IEEE754 that I should consider ...?
CCSI uses a variation on binary32 for select processors.
it seems that almost every modern system is using IEEE754,
Yes, but... various implementations fudge on the particulars with edge values like subnormals, negative zero in visual studio, infinity and not-a-number.
It is this second issue that is more lethal and harder to discern that a given implementation has completely coded IEEE754. See __STDC_IEC_559__
OP has "I am writing a marshaling layer". It is in this coding that likely troubles remain for edge cases. Also IEEE754 does not specify endian so that marshaling issues remains. Recall integer endian may not match FP endian.
My program has some problems with precision when using REAL(KIND=16) or REAL*16. Is there a way to go higher than that with precision?
REAL*32 (kind values are not directly portable) would bee a 256 bit real. There is no such IEEE floating point type. See http://en.wikipedia.org/wiki/IEEE_floating-point_standard
I don't know of any processor (compiler) that supports such a kind as an extension. Also, no hardware known to me handles this natively.
At such high precisions already I would reconsider the algorithm and its stability. It is not usual for program to need more then quad (your 16 bytes) precision. Even double is normally enough. I do many of my computations with single precision.
Finally, there are some libraries that support more precision, but their use is more complicated than just recompiling with different kind parameter. See
http://crd-legacy.lbl.gov/~dhbailey/mpdist/
Is there an arbitrary precision floating point library for C/C++ which allows arbitrary precision exponents?
At a special request: The kind numbers are implementation dependent. Kind 16 may not exist or may not denote IEEE 128 bit float. See many questions here
Fortran: integer*4 vs integer(4) vs integer(kind=4)
Fortran 90 kind parameter
What does `real*8` mean? and so on.
I am trying to implement a program with floating point numbers, using two or more programming languages. The program does say 50k iterations to finally bring the error to very small value.
To ensure that my results are comparable, I wanted to make sure I use data types of same precision in different languages. Would you please tell if there is correspondence between float/double of C/C++ to that in D and Go. I expect C/C++ and D to be quite close in this regard, but not sure. Thanks a lot.
Generally, for compiled languages, floating point format and precision comes down to two things:
The library used to implement the floating point functions that aren't directly supported in hardware.
The hardware the system is running on.
It may also depend on what compiler options you give (and how sophisticated the compiler is in general) - many modern processors have vector instructions, and the result may be subtly different than if you use "regular" floating point instructions (e.g. FPU vs. SSE on x86 processors). You may also see differences, sometimes, because the internal calculations on an x86 FPU is 80-bits, stored as 64-bits when the computation is completed.
But generally, given the same hardware, and similar type of compilers, I'd expect to get the same result [and roughly the same performance] from two different [sufficiently similar] languages.
Most languages have either only "double" (typically 64-bit) or "single and double" (e.g. float - typically 32-bit and double - typically 64-bit in C/C++ - and probably D as well, but I'm not that into D).
In Go, floating point types follow the IEEE-754 standard.
Straight from the spec (http://golang.org/ref/spec#Numeric_types)
float32 the set of all IEEE-754 32-bit floating-point numbers
float64 the set of all IEEE-754 64-bit floating-point numbers
I'm not familiar with D, but this page might be of interest: http://dlang.org/float.html.
For C/C++, the standard doesn't require IEEE-754, but in C++ you could use is_iec559() to check if your compiler is using IEEE-754. See this question: How to check if C++ compiler uses IEEE 754 floating point standard
I've heard that there are many problems with floats/doubles on different CPU's.
If i want to make a game that uses floats for everything, how can i be sure the float calculations are exactly the same on every machine so that my simulation will look exactly same on every machine?
I am also concerned about writing/reading files or sending/receiving the float values to different computers. What conversions there must be done, if any?
I need to be 100% sure that my float values are computed exactly the same, because even a slight difference in the calculations will result in a totally different future. Is this even possible ?
Standard C++ does not prescribe any details about floating point types other than range constraints, and possibly that some of the maths functions (like sine and exponential) have to be correct up to a certain level of accuracy.
Other than that, at that level of generality, there's really nothing else you can rely on!
That said, it is quite possible that you will not actually require binarily identical computations on every platform, and that the precision and accuracy guarantees of the float or double types will in fact be sufficient for simulation purposes.
Note that you cannot even produce a reliable result of an algebraic expression inside your own program when you modify the order of evaluation of subexpressions, so asking for the sort of reproducibility that you want may be a bit unrealistic anyway. If you need real floating point precision and accuracy guarantees, you might be better off with an arbitrary precision library with correct rounding, like MPFR - but that seems unrealistic for a game.
Serializing floats is an entirely different story, and you'll have to have some idea of the representations used by your target platforms. If all platforms were in fact to use IEEE 754 floats of 32 or 64 bit size, you could probably just exchange the binary representation directly (modulo endianness). If you have other platforms, you'll have to think up your own serialization scheme.
What every programmer should know: http://docs.sun.com/source/806-3568/ncg_goldberg.html