comparing two doubles in the latest visual studio [duplicate] - c++

After upgrading C++ project to Visual studio 2013, the result of the program has changed because of different floating point behavior of the new VC compiler. The floating model is set to /fp:precise
In Visual Studio 2008(v9.0)
float f = 0.4f; //it produce f = 0.400000001
float f6 = 0.400000006f; //it produce f = 0.400000001
In Visual Studio 2013 (v12.0)
float f = 0.4f; //it produce f = 0.400000006
float f1 = 0.40000001f; //it produce f1 = 0.400000006
The setting for the project is identical (converted).
I understand that there is a kind of liberty in floating point model, but I don't like that certain things has changed and the poeple working with old/new version of Visual Studio can't reproduce some bugs being reported by another developers. Is there any setting which can be changed to enforce the same behavior across different version of Visual Studio?
I tried to set platform toolset to vs90 and it still produce 0.400000006 in VS2013.
UPDATE:
I tracked the hexadicimal values in Memory Window. The hexadecimal values, f1, f1 and f6 are all the same. There is a difference in displaying these float values in the watch window. Futhermore the problem is still with float 0.3f. Multiplying same decimal values gives different result.
In Visual Studio 2008(v9.0)
float a = 0.3f; //memory b8 1e 85 3e 00 00 40 40, watch 0.25999999
float b = 19400;
unsigned long c = (unsigned long)((float)b * a); //5043
In Visual Studio 2013 (v12.0)
float a = 0.3f; //memory b8 1e 85 3e 00 00 40 40, watch 0.259999990
float b = 19400;
unsigned long c = (unsigned long)((float)b * a); //5044

The behavior is correct, the float type can store only 7 significant digits. The rest are just random noise. You need to fix the bug in your code, you are either displaying too many digits, thus revealing the random noise to a human, or your math model is losing too many significant digits and you should be using double instead.
There was a significant change in VS2012 that affect the appearance of the noise digits. Part of the code generator changes that implement auto-vectorization. The 32-bit compiler traditionally used the FPU for calculations. Which is very notorious for producing different random noise, calculations are performed with an 80-bit intermediate format and get truncated when stored back to memory. The exact moment when this truncation occurs can be unpredictable due to optimizer choices, thus producing different random noise.
The code generator, like it already did for 64-bit code, now uses SSE2 instructions instead of FPU instructions. Which produce more consistent results that are not affected by the code optimizer choices, the 80-bit intermediate format is no longer used. A backgrounder on the trouble with the FPU is available in this answer.
This will be the behavior going forward, the FPU will never come back again. Adjust your expectations accordingly, there's a "new normal". And fix the bugs.

Related

Different optimization in VS2015 vs VS2013 causes floating point exception

I have a small example of issue which came up during the transition from VS2013 to VS2015. In VS2015 further mentioned code example causes floating-point invalid operation.
int main()
{
unsigned int enableBits = _EM_OVERFLOW | _EM_ZERODIVIDE | _EM_INVALID;
_clearfp();
_controlfp_s(0, ~enableBits, enableBits);
int count = 100;
float array[100];
for (int i = 0; i < count; ++i)
{
array[i] = (float)pow((float)(count - 1 - i) / count, 4); //this causes exception in VS2015
}
return 0;
}
This happens only in release mode so its probably caused by different optimization. Is there something wrong with this code or is this a bug in VS 2015?
Its hard to find issues like these across the whole code base so I am looking for some systematic fix not a workaround (e.g. use different variable instead of i which works)
I also checked generated assembly code and it seems in VS2013 it uses whole 128bit registry to perform 4 float operations in one division. In VS2015 it seems to do only 2 float operations and the rest of registry is zero (or some garbage) which probably introduces this exception.
Instruction which causes exception is marked in picture.
VS2013
and VS2015
Any help will be appreciated.
Thanks.
This looks to be an interaction with you using floating point exceptions but also enabling some floating point optimizations.
What the code is doing is it does 2 iterations at once (loop unrolling) but uses divps which does 4 divides at once (from the 4 floats in an XMM register). The upper 2 floats in the XMM register are not used, and are zero. As the result of the divide of the values in those slots aren't used it doesn't normally matter. However, as you set custom exception handling this raises a invalid op exception that you see even though its generating values which wont be used.
Your choices are, as I see them, to set /fp:strict which will disable optimisations so make this work (but it will obviously make the code slower) or remove the controlfp call.

Does integer overflow cause undefined behavior because of memory corruption?

I recently read that signed integer overflow in C and C++ causes undefined behavior:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
I am currently trying to understand the reason of the undefined behavior here. I thought undefined behavior occurs here because the integer starts manipulating the memory around itself when it gets too big to fit the underlying type.
So I decided to write a little test program in Visual Studio 2015 to test that theory with the following code:
#include <stdio.h>
#include <limits.h>
struct TestStruct
{
char pad1[50];
int testVal;
char pad2[50];
};
int main()
{
TestStruct test;
memset(&test, 0, sizeof(test));
for (test.testVal = 0; ; test.testVal++)
{
if (test.testVal == INT_MAX)
printf("Overflowing\r\n");
}
return 0;
}
I used a structure here to prevent any protective matters of Visual Studio in debugging mode like the temporary padding of stack variables and so on.
The endless loop should cause several overflows of test.testVal, and it does indeed, though without any consequences other than the overflow itself.
I took a look at the memory dump while running the overflow tests with the following result (test.testVal had a memory address of 0x001CFAFC):
0x001CFAE5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x001CFAFC 94 53 ca d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
As you see, the memory around the int that is continuously overflowing remained "undamaged". I tested this several times with similar output. Never was any memory around the overflowing int damaged.
What happens here? Why is there no damage done to the memory around the variable test.testVal? How can this cause undefined behavior?
I am trying to understand my mistake and why there is no memory corruption done during an integer overflow.
You misunderstand the reason for undefined behavior. The reason is not memory corruption around the integer - it will always occupy the same size which integers occupy - but the underlying arithmetics.
Since signed integers are not required to be encoded in 2's complement, there can not be specific guidance on what is going to happen when they overflow. Different encoding or CPU behavior can cause different outcomes of overflow, including, for example, program kills due to traps.
And as with all undefined behavior, even if your hardware uses 2's complement for its arithmetic and has defined rules for overflow, compilers are not bound by them. For example, for a long time GCC optimized away any checks which would only come true in a 2's-complement environment. For instance, if (x > x + 1) f() is going to be removed from optimized code, as signed overflow is undefined behavior, meaning it never happens (from compiler's view, programs never contain code producing undefined behavior), meaning x can never be greater than x + 1.
The authors of the Standard left integer overflow undefined because some hardware platforms might trap in ways whose consequences could be unpredictable (possibly including random code execution and consequent memory corruption). Although two's-complement hardware with predictable silent-wraparound overflow handling was pretty much established as a standard by the time the C89 Standard was published (of the many reprogrammable-microcomputer architectures I've examined, zero use anything else) the authors of the Standard didn't want to prevent anyone from producing C implementations on older machines.
On implementations which implemented commonplace two's-complement silent-wraparound semantics, code like
int test(int x)
{
int temp = (x==INT_MAX);
if (x+1 <= 23) temp+=2;
return temp;
}
would, 100% reliably, return 3 when passed a value of INT_MAX, since adding
1 to INT_MAX would yield INT_MIN, which is of course less than 23.
In the 1990s, compilers used the fact that integer overflow was undefined behavior, rather than being defined as two's-complement wrapping, to enable various optimizations which meant that the exact results of computations that overflowed would not be predictable, but aspects of behavior that didn't depend upon the exact results would stay on the rails. A 1990s compiler given the above code might likely treat it as though adding 1 to INT_MAX yielded a value numerically one larger than INT_MAX, thus causing the function to return 1
rather than 3, or it might behave like the older compilers, yielding 3. Note that in the above code, such treatment could save an instruction on many platforms, since (x+1 <= 23) would be equivalent to (x <= 22). A compiler may
not be consistent in its choice of 1 or 3, but the generated code would not do anything other than yield one of those values.
Since then, however, it has become more fashionable for compilers to use the
Standard's failure to impose any requirements on program behavior in case of
integer overflow (a failure motivated by the existence of hardware where the
consequences might be genuinely unpredictable) to justify having compilers
launch code completely off the rails in case of overflow. A modern compiler
could notice that the program will invoke Undefined Behavior if x==INT_MAX,
and thus conclude that the function will never be passed that value. If the
function is never passed that value, the comparison with INT_MAX can be
omitted. If the above function were called from another translation unit
with x==INT_MAX, it might thus return 0 or 2; if called from within the same
translation unit, the effect might be even more bizarre since a compiler would
extend its inferences about x back to the caller.
With regard to whether overflow would cause memory corruption, on some old hardware it might have. On older compilers running on modern hardware, it won't. On hyper-modern compilers, overflow negates the fabric of time and causality, so all bets are off. The overflow in the evaluation of x+1 could effectively corrupt the value of x that had been seen by the earlier comparison against INT_MAX, making it behave as though the value of x in memory had been corrupted. Further, such compiler behavior will often remove conditional logic that would have prevented other kinds of memory corruption, thus allowing arbitrary memory corruption to occur.
Undefined behaviour is undefined. It may crash your program. It may do nothing at all. It may do exactly what you expected. It may summon nasal demons. It may delete all your files. The compiler is free to emit whatever code it pleases (or none at all) when it encounters undefined behaviour.
Any instance of undefined behaviour causes the entire program to be undefined - not just the operation that is undefined, so the compiler may do whatever it wants to any part of your program. Including time travel: Undefined behavior can result in time travel (among other things, but time travel is the funkiest).
There are many answers and blog posts about undefined behaviour, but the following are my favorites. I suggest reading them if you want to learn more about the topic.
A Guide to Undefined Behavior in C and C++, Part 1
What Every C Programmer Should Know About Undefined Behavior #1/3
In addition to the esoteric optimization consequences, you've got to consider other issues even with the code you naively expect a non-optimizing compiler to generate.
Even if you know the architecture to be twos complement (or whatever), an overflowed operation might not set flags as expected, so a statement like if(a + b < 0) might take the wrong branch: given two large positive numbers, so when added together it overflows and the result, so the twos-complement purists claim, is negative, but the addition instruction may not actually set the negative flag)
A multi-step operation may have taken place in a wider register than sizeof(int), without being truncated at each step, and so an expression like (x << 5) >> 5 may not cut off the left five bits as you assume they would.
Multiply and divide operations may use a secondary register for extra bits in the product and dividend. If multiply "can't" overflow, the compiler is free to assume that the secondary register is zero (or -1 for negative products) and not reset it before dividing. So an expression like x * y / z may use a wider intermediate product than expected.
Some of these sound like extra accuracy, but it's extra accuracy that isn't expected, can't be predicted nor relied upon, and violates your mental model of "each operation accepts N-bit twos-complement operands and returns the least significant N bits of the result for the next operation"
Integer overflow behaviour is not defined by the C++ standard. This means that any implementation of C++ is free to do whatever it likes.
In practice this means: whatever is most convenient for the implementor. And since most implementors treat int as a twos-complement value, the most common implementation nowadays is to say that an overflowed sum of two positive numbers is a negative number which bears some relation to the true result. This is a wrong answer and it is allowed by the standard, because the standard allows anything.
There is an argument to say that integer overflow ought to be treated as an error, just like integer division by zero. The '86 architecture even has the INTO instruction to raise an exception on overflow. At some point that argument may gain enough weight to make it into mainstream compilers, at which point an integer overflow may cause a crash. This also conforms with the C++ standard, which allows an implementation to do anything.
You could imagine an architecture in which numbers were represented as null-terminated strings in little-endian fashion, with a zero byte saying "end of number". Addition could be done by adding byte by byte until a zero byte was reached. In such an architecture an integer overflow might overwrite a trailing zero with a one, making the result look far, far longer and potentially corrupting data in future. This also conforms with the C++ standard.
Finally, as pointed out in some other replies, a great deal of code generation and optimization depends on the compiler reasoning about the code it generates and how it would execute. In the case of an integer overflow, it is entirely licit for the compiler (a) to generate code for addition which gives negative results when adding large positive numbers and (b) to inform its code generation with the knowledge that addition of large positive numbers gives a positive result. Thus for example
if (a+b>0) x=a+b;
might, if the compiler knows that both a and b are positive, not bother to perform a test, but unconditionally to add a to b and put the result into x. On a twos-complement machine, that could lead to a negative value being put into x, in apparent violation of the intent of the code. This would be entirely in conformity with the standard.
It is undefined what value is represented by the int. There's no 'overflow' in memory like you thought.

Floating point mismatch between compilers (Visual Studio 2010 and GCC)

I'm trying to solve a cross-platform issue that's cropping up and I'm not sure quite how to go about it. Here's a demonstration program:
#include <cmath>
#include <cstdio>
int main()
{
int xm = 0x3f18492a;
float x = *(float*)&xm;
x = (sqrt(x) + 1) / 2.0f;
printf("%f %x\n", x, *(int*)&x);
}
The output on Windows when compiled in VS2010 is:
0.885638 3f62b92a
The output when compiled with GCC 4.8.1 (ideone.com sample) is:
0.885638 3f62b92b
These small mismatches end up ballooning into a serious problem over the course of a program that needs to run identically on multiple platforms. I'm not concerned so much about "accuracy" as that the results match each other. I tried switching the /fp mode in VS to strict from precise, but that doesn't seem to fix it.
What other avenues should I look at to make this calculation have the same result on both platforms?
UPDATE: Interestingly, if I change the code like this, it matches across the platforms:
#include <cmath>
#include <cstdio>
int main()
{
int xm = 0x3f18492a;
float x = *(float*)&xm;
//x = (sqrt(x) + 1) / 2.0f;
float y = sqrt(x);
float z = y + 1;
float w = z / 2.0f;
printf("%f %x %f %x %f %x %f %x\n", x, *(int*)&x, y, *(int*)&y, z, *(int*)&z, w, *(int*)&w);
}
I'm not sure it's realistic, however, to be walking through the code and changing all floating point operations like this!
Summary: This is generally not supported by compilers, you will have a tough time doing it in a higher-level language, and you will need to use one math library common to all your target platforms.
The C and C++ language standards allow implementations a considerable amount (too much) of flexibility in floating-point operations. Many C and C++ floating-operations are not required to adhere to the IEEE 754-2008 standard in the way that might be intuitive to many programmers.
Even many C and C++ implementations do not provide good support for adhering to the IEEE 754-2008 standard.
Math library implementations are a particular problem. There does not exist any normal library (commercially available or widely-used open source with a known-bounded run-time) that provides correctly rounded results for all standard math functions. (Getting the mathematics right on some of the functions is a very difficult problem.)
sqrt, however, is relatively simple and should return correctly rounded results in an library of reasonable quality. (I am unable to vouch for the Microsoft implementation.) It is more likely the particular problem in the code you show is the compiler’s choice to use varying precisions of floating-point while evaluating expressions.
There may be various switches you can use with various compilers to ask them to conform to certain rules about floating-point behavior. Those may be sufficient for getting elementary operations to perform as expected. If not, assembly language is a way to access well-defined floating-point operations. However, the behavior of library routines will be different between platforms unless you supply a common library. This includes both math library routines (such as pow) and conversions found in routines such as fprintf, fscanf, strtof. You must therefore find one well-designed implementation of each routine you rely on that is supported on all of the platforms you target. (It must be well-designed in the sense that it provides identical behavior on all platforms. Mathematically, it could be somewhat inaccurate, as long as it is within bounds tolerable for your application.)
The Visual Studio compiler tends to generate instructions that use the old x87 FPU(*), but it generates code at the beginning of the executable to set the FPU to the precision of the double format.
GCC can also generate instructions that use the old x87 FPU, but when generating x86-64 code, the default is to use SSE2. On Mac OS X, the default is to use SSE2 even in 32-bit since all Intel Macs have SSE2. When it generates instruction for the 387, GCC does not set the precision of the FPU to the double format, so that computations are made in the 80-bit double-extended format, and then rounded to double when assigned.
As a consequence:
If you use only double computations, Visual Studio should generate a program that computes exactly at the precision of the type, because it is always double(**). And if on the GCC side you use -msse2 -mfpmath=sse, you can expect GCC to also generate code that computes at the precision of doubles, this time by using SSE2 instructions. The computations should match.
Or if you make both GCC and Visual Studio emit SSE2 instructions, again, the computations should match. I am not familiar with Visual Studio but the switch may be /arch:SSE2.
This does not solve the problem with math libraries, which is indeed an unsolved problem. If your computations involve trigonometric or other functions, you must use the same library as part of your project on both sides. I would recommend CRlibm. Less accurate libraries are fine too as long as it's the same library, and it respects the above constraints (using only double or compiled with SSE2 on both sides).
(*) There may be a way to instruct it to generate SSE2 instructions. If you find it, use it: it will solve your particular problem.
(**) modulo exceptions for infinities and subnormals.
C allows intermediate calculation to occurs at the floating point's precision or at a higher one.
The windows result matches the GCC if all calculations occurs using only using float.
The GCC calculation obtains the different (and more accurate) result when all calculations are coded as float, but are allowed to go to double or long double for intermediate results.
So even if everything is IEEE 754 compliant, controlling the allowed intermediate calculations has an effect.
[Edit] I do not think the above really answers the OP stated issue, but is a concern for general FP issues. It is the below, that I think best explains the difference.
MS dev network sqrt
I suspect the difference was that in the windows compilation, it was in C++ mode, thus the sqrt(x) called float sqrt(float). In gcc, it was in C mode and sqrt(x1) called double sqrt(double).
If this is the case, insure C code in windows is compiled in C mode and not C++.
int main() {
{
volatile float f1;
float f2;
double d1;
int xm = 0x3f18492a;
f1 = *(float*) &xm;
f2 = *(float*) &xm;
d1 = *(float*) &xm;
f1 = sqrtf(f1);
f1 = f1 + 1.0f;
f1 = f1 / 2.0f;
printf("f1 %0.17e %a %08X\n", f1, f1, *(int*)&f1);
f2 = (sqrt(f2) + 1) / 2.0;
printf("f2 %0.17e %a %08X\n", f2, f2, *(int*)&f2);
d1 = (sqrt(d1) + 1) / 2.0;
printf("d1 %0.17e %a\n", d1, d1);
return 0;
}
f1 8.85637879371643066e-01 0x1.c57254p-1 3F62B92A
f2 8.85637938976287842e-01 0x1.c57256p-1 3F62B92B
d1 8.85637911452129889e-01 0x1.c572551391bc9p-1
The IEEE 754 specifies that computations can be processed at a higher precision than what is stored in memory then rounded when written back to memory. This causes many issues such as the one you see. In short, the standard does not promise that the same computation carried out on all hardware will return the same answer.
If the value to be computed on is placed on a larger register then one computation is done and then the value is moved off of the register back to memory the result is truncated there. It could then be moved back on to the larger register for another computation.
On the other hand if all the computations are done on the larger register before the value is moved back to memory you will get a different result. You may have to disassemble the code to see what's happening in your case.
With float point work it's important to understand how much precision you need in the final answer and how much precision you are guaranteed to have by the precision (note the two uses of the word) of the variables you choose and never expect anymore precision than you are guaranteed.
In the end when you are comparing results (this is true of any floating point work) you can not look for exact matches, you determine the precision that you require and you check that the difference between two values is less than the precision you require.
Getting back to practicalities, the Intel processors have 80 bit registers for floating point computations that may be used even though you've specified float which is typically 32-bit (but not always).
If you want to have fun try turning on various optimizations and processor options like SSE in your compiler and see what results you get (as well as what comes out of the disassembler).
With my 4.6.3 compiler it generates this code:
main:
.LFB104:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $1063434539, %esi
movl $.LC1, %edi
movsd .LC0(%rip), %xmm0
movl $1, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LC0:
.long 1610612736
.long 1072453413
Note that there is ZERO calculations performed in this code, just storing various constants in registers.
I haven't got a Visual stdudio compiler, so I don't know what that produces.
GCC compiler implements so called strict-aliasing semantics, which relies on the fact that in both C and C++ it is generally illegal to perform type-punning through pointer conversions (with few exceptions). Your code contains multiple violations of the requirements of strict-aliasing semantics. So, it is perfectly logical to expect that combination of strict-aliasing semantics and optimizations might produce completely unexpected and seemingly illogical results in GCC (or any other compiler, for that matter).
On top of that, there would be nothing unusual in sqrt producing slightly different results in different implementations.
If you have freedom to change languages, consider using Java with "strictfp". The Java language specification gives very precise rules for order of operations, rounding etc. in strictfp mode.
Exactly matching results across implementations is an objective of the Java standard for strictfp mode. It is not an objective for the C++ standards.
You want them both to be using the IEEE 754 standard.

Wrong conversion from double to QString in Qt on ARM

I have Qt 4.4.3 built for ARMv5TE. I try to convert a double to a QString:
#include <QtCore/QtCore>
#include <cmath>
int main(int argc, char** argv)
{
const double pi = M_PI;
qDebug() << "Pi is : " << pi << "\n but pi is : " << QString::number(pi, 'f', 6);
printf("printf: %f\n",pi);
return 0;
}
but get strange output:
Pi is : 8.6192e+97
but pi is : "86191995128153827662389718947289094511677209256133209964237318700300913082475855805240843511529472.0000000000000000"
printf: 3.141593
How do I get the proper string?
This looks to be a sort of endianess issue, but not your plain-vanilla big-endian vs little-endian problem. ARM sometimes uses an unusual byte ordering for double. From "Handbook of Floating-Point Arithmetic" by Jean-Michel Muller, et al.:
... the double-precision number that is closest to
-7.0868766365730135 x 10^-268 is encoded by the sequence of bytes
11 22 33 44 55 66 77 88 in memory (from the lowest to the highest
one) on x86 and Linux/IA-64 platforms (they are said to be
little-endian) and by 88 77 66 55 44 33 22 11 on most PowerPC platforms (they are said to be big-endian). Some architectures, such
as IA-64, ARM, and PowerPC are said to be bi-endian. i.e., they may
be either little-endian or big-endian depending on their
configuration.
There exists an exception: some ARM-based platforms. ARM processors
have traditionally used the floating-point accelerator (FPA)
architecture, where the double-precision numbers are decomposed into
two 32-bit words in the big-endian order and stored according to the
endianess of the machine, i.e., little-endian in general, which means
that the above number is encoded by the sequence 55 66 77 88 11 22 33
44. ARM has recently introduced a new architecture for floating-point
arithmetic: vector floating-point (VFP), where the words are stored
in the processor's native byte order.
When looked at in a big-endian byte order, M_PI will have a representation that looks like:
0x400921fb54442d18
The large number approximated by 8.6192e+97 wil have a representation that looks like:
0x54442d18400921fb
If you look closely, the two 32-bit words are swapped, but the byte order within the 32-bit words is the same. So apparently, the ARM 'traditional' double point format seems to be confusing the Qt library (or the Qt library is misconfigured).
I'm not sure if the processor is using the traditional format and Qt expects it to be in VFP format, or if things are the other way around. But it seems to be one of those two situations.
I'm also not sure exactly how to fix the problem - I'd guess there's some option for building Qt to handle this correctly.
the following snippet will at least tell you what format for double the compiler is using, which may help you narrow down what needs to change in Qt:
unsigned char* b;
unsigned char* e;
double x = -7.0868766365730135e-268;
b = (unsigned char*) &x;
e = b + sizeof(x);
for (; b != e; ++b) {
printf( "%02x ", *b);
}
puts("");
A plain little-endian machine will display:
11 22 33 44 55 66 77 88
Update with a bit more analysis:
At the moment, I'm unable to perform any real debugging of this (I don't even have access to my workstation at the moment), but by looking at the Qt source available on http://qt.gitorious.org here's additional analysis:
It looks like Qt calls in to the QLocalePrivate::doubleToString() function in qlocale.cpp to convert a double to an alphanumeric form.
If Qt is compiled with QT_QLOCALE_USES_FCVT defined, then QLocalePrivate::doubleToString() will use the platform's fcvt() function to perform the conversion. If QT_QLOCALE_USES_FCVT is not defined, then QLocalePrivate::doubleToString() ends up calling _qdtoa() to perform the conversion. That function examines the various fields of the double directly and appears to assume that the double is in a strict big-endian or little-endian form (for example, using the getWord0() and getWord1() functions to get the low and high word of the double respectively).
See http://qt.gitorious.org/qt/qt/blobs/HEAD/src/corelib/tools/qlocale.cpp and http://qt.gitorious.org/qt/qt/blobs/HEAD/src/corelib/tools/qlocale_tools.cpp or your own copy of the files for details.
Assuming that your platform is using the traditional ARM FPA representation for double (where the 32-bit halves of the double are stored in big-endian order regardless of whether the overall system being little-endian), I think you'll need to build Qt with the QT_QLOCALE_USES_FCVT defined. I believe that all you'll need to do is pass the -DQT_QLOCALE_USES_FCVT option to the configure script when building Qt.
The same code produces proper output on an x86 machine (running Windows XP) with Qt 4.7.0.
I see the following possibilities for the source of the problem:
Some bug which is maybe fixed in a newer version of Qt
Something went wrong when compiling for ARM
I found this forum post on a similar problem which supposes it could be a big/little-endian conversion problem.
I can't tell how to fix this as I am not experienced with ARM at all but maybe this information helps you anyway.

Why are doubles added incorrectly in a specific Visual Studio 2008 project?

Trying to port java code to C++ I've stumbled over some weird behaviour. I can't get double addition to work (even though compiler option /fp:strict which means "correct" floating point math is set in Visual Studio 2008).
double a = 0.4;
/* a: 0.40000000000000002, correct */
double b = 0.0 + 0.4;
/* b: 0.40000000596046448, incorrect
(0 + 0.4 is the same). It's not even close to correct. */
double c = 0;
float f = 0.4f;
c += f;
/* c: 0.40000000596046448 too */
In a different test project I set up it works fine (/fp:strict behaves according to IEEE754).
Using Visual Studio 2008 (standard) with No optimization and FP: strict.
Any ideas? Is it really truncating to floats? This project really needs same behaviour on both java and C++ side. I got all values by reading from debug window in VC++.
Solution: _fpreset(); // Barry Kelly's idea solved it. A library was setting the FP precision to low.
The only thing I can think of is perhaps you are linking against a library or DLL which has modified the CPU precision via the control word.
Have you tried calling _fpreset() from float.h before the problematic computation?
Yes, it's certainly truncating to floats. I get the same value printing float f = 0.4 as you do in the "inaccurate" case. Try:
double b = 0.0 + (double) 0.4;
The question then is why it's truncating to floats. There's no excuse in the standard for treating 0.0 + 0.4 as a single-precision expression, since floating point literals are double-precision unless they have a suffix to say otherwise.
So something must be interfering with your settings, but I have no idea what.