Wrong conversion from double to QString in Qt on ARM - c++

I have Qt 4.4.3 built for ARMv5TE. I try to convert a double to a QString:
#include <QtCore/QtCore>
#include <cmath>
int main(int argc, char** argv)
{
const double pi = M_PI;
qDebug() << "Pi is : " << pi << "\n but pi is : " << QString::number(pi, 'f', 6);
printf("printf: %f\n",pi);
return 0;
}
but get strange output:
Pi is : 8.6192e+97
but pi is : "86191995128153827662389718947289094511677209256133209964237318700300913082475855805240843511529472.0000000000000000"
printf: 3.141593
How do I get the proper string?

This looks to be a sort of endianess issue, but not your plain-vanilla big-endian vs little-endian problem. ARM sometimes uses an unusual byte ordering for double. From "Handbook of Floating-Point Arithmetic" by Jean-Michel Muller, et al.:
... the double-precision number that is closest to
-7.0868766365730135 x 10^-268 is encoded by the sequence of bytes
11 22 33 44 55 66 77 88 in memory (from the lowest to the highest
one) on x86 and Linux/IA-64 platforms (they are said to be
little-endian) and by 88 77 66 55 44 33 22 11 on most PowerPC platforms (they are said to be big-endian). Some architectures, such
as IA-64, ARM, and PowerPC are said to be bi-endian. i.e., they may
be either little-endian or big-endian depending on their
configuration.
There exists an exception: some ARM-based platforms. ARM processors
have traditionally used the floating-point accelerator (FPA)
architecture, where the double-precision numbers are decomposed into
two 32-bit words in the big-endian order and stored according to the
endianess of the machine, i.e., little-endian in general, which means
that the above number is encoded by the sequence 55 66 77 88 11 22 33
44. ARM has recently introduced a new architecture for floating-point
arithmetic: vector floating-point (VFP), where the words are stored
in the processor's native byte order.
When looked at in a big-endian byte order, M_PI will have a representation that looks like:
0x400921fb54442d18
The large number approximated by 8.6192e+97 wil have a representation that looks like:
0x54442d18400921fb
If you look closely, the two 32-bit words are swapped, but the byte order within the 32-bit words is the same. So apparently, the ARM 'traditional' double point format seems to be confusing the Qt library (or the Qt library is misconfigured).
I'm not sure if the processor is using the traditional format and Qt expects it to be in VFP format, or if things are the other way around. But it seems to be one of those two situations.
I'm also not sure exactly how to fix the problem - I'd guess there's some option for building Qt to handle this correctly.
the following snippet will at least tell you what format for double the compiler is using, which may help you narrow down what needs to change in Qt:
unsigned char* b;
unsigned char* e;
double x = -7.0868766365730135e-268;
b = (unsigned char*) &x;
e = b + sizeof(x);
for (; b != e; ++b) {
printf( "%02x ", *b);
}
puts("");
A plain little-endian machine will display:
11 22 33 44 55 66 77 88
Update with a bit more analysis:
At the moment, I'm unable to perform any real debugging of this (I don't even have access to my workstation at the moment), but by looking at the Qt source available on http://qt.gitorious.org here's additional analysis:
It looks like Qt calls in to the QLocalePrivate::doubleToString() function in qlocale.cpp to convert a double to an alphanumeric form.
If Qt is compiled with QT_QLOCALE_USES_FCVT defined, then QLocalePrivate::doubleToString() will use the platform's fcvt() function to perform the conversion. If QT_QLOCALE_USES_FCVT is not defined, then QLocalePrivate::doubleToString() ends up calling _qdtoa() to perform the conversion. That function examines the various fields of the double directly and appears to assume that the double is in a strict big-endian or little-endian form (for example, using the getWord0() and getWord1() functions to get the low and high word of the double respectively).
See http://qt.gitorious.org/qt/qt/blobs/HEAD/src/corelib/tools/qlocale.cpp and http://qt.gitorious.org/qt/qt/blobs/HEAD/src/corelib/tools/qlocale_tools.cpp or your own copy of the files for details.
Assuming that your platform is using the traditional ARM FPA representation for double (where the 32-bit halves of the double are stored in big-endian order regardless of whether the overall system being little-endian), I think you'll need to build Qt with the QT_QLOCALE_USES_FCVT defined. I believe that all you'll need to do is pass the -DQT_QLOCALE_USES_FCVT option to the configure script when building Qt.

The same code produces proper output on an x86 machine (running Windows XP) with Qt 4.7.0.
I see the following possibilities for the source of the problem:
Some bug which is maybe fixed in a newer version of Qt
Something went wrong when compiling for ARM
I found this forum post on a similar problem which supposes it could be a big/little-endian conversion problem.
I can't tell how to fix this as I am not experienced with ARM at all but maybe this information helps you anyway.

Related

_ExtInt(256) doesn't work in CLang 14, but works in CLang 13

When CLang 14 was released I noticed that following code doesn't compile anymore (but works in CLang 13):
Try it online!
int main() { using T = _ExtInt(256); }
This code was used to create fixed size integers of arbitrary bit length, even of unusual bit sizes like _ExtInt(12345). And everything worked for any bit size.
In CLang 14 you can use only 128 bits:
Try it online!
int main() { using T = _ExtInt(128); }
Above code works in both CLang 13 and 14, but says _ExtInt() is deprecated and suggest to use _BitInt(128) instead.
Of course there exists unsigned __int128, but _ExtInt()/_BitInt() can be used to create any bit size including prime numbers like _ExtInt(97)/_BitInt(97) (online).
I have two questions:
Is there any chance in CLang 14 to have (natively) integers bigger than 128 bits? Of course external libraries like Boost can provide them. But what about native support? As CLang 13 natively supported any weird bit size like _ExtInt(12345).
Is _BitInt(x) exactly same as _ExtInt(x) in CLang 14 for any x?
From https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html#c-language-changes-in-clang:
Currently, Clang supports bit widths <= 128 because backends are not yet able to cope with some math operations (like division) on wider integer types. See PR44994 for more information.
And
_BitInt(N) and _ExtInt(N) are the same types in all respects beyond spelling and the deprecation warning.
Note that while Clang 13 allows _ExtInt(256), it crashes as soon as a _ExtInt(256) value is divided by 3 (or basically any other value that's not statically a power of 2). (live)

String to Unicode, and Unicode to decimal code point (C++)

Despite seing a lot of questions of the forum about unicode and string conversion (in C/C++) and Googling for hours on the topic, I still can't find a straight explanation to what seems to me like a very basic process. Here is what I want to do:
I have a string which potentially uses any characters of any possible language. Let's take cyrillic for example. So say I have:
std::string str = "сапоги";
I want to loop over each character making up that string and:
Know/print the character's Unicode value
Convert that Unicode value to a decimal value
I really Googled that for hours and couldn't find a straight answer. If someone could show me how this could be done, it would be great.
EDIT
So I managed to get that far:
#include <cstdlib>
#include <cstdio>
#include <iostream>
#include <locale>
#include <codecvt>
#include <iomanip>
// utility function for output
void hex_print(const std::string& s)
{
std::cout << std::hex << std::setfill('0');
for(unsigned char c : s)
std::cout << std::setw(2) << static_cast<int>(c) << ' ';
std::cout << std::dec << '\n';
}
int main()
{
std::wstring test = L"сапоги";
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv1;
std::string u8str = conv1.to_bytes(test);
hex_print(u8str);
return 1;
}
Result:
04 41 04 30 04 3f 04 3e 04 33 04 38
Code
Which is correct (it maps to unicode). The problem is that I don't know whether I should use utf-8, 16 or something else (as pointed out by Chris in the comment). Is there a way I can find out about that? (whatever encoding it uses originally or whatever encoding needs to be used?)
EDIT 2
I thought I would address some of the comments with a second edit:
"Convert that Unicode value to a decimal value" Why?
I will explain why, but I also wanted to comment in a friendly way, that my problem was not 'why' but 'how';-). You can assume the OP has a reason for asking this question, yet of course, I understand people are curious as to why... so let me explain. The reason why I need all this is because I ultimately need to read the glyphs from a font file (TrueType OpenType doesn't matter). It happens that these files have a table called cmap that is some sort of associative array that maps the value of a character (in the form on a code point) to the index of the glyph in the font file. The code points in the table are not defined using the notation U+XXXX but directly in the decimal counterpart of that number (assuming the U+XXXX notation is the hexadecimal representation of a uint16 number [or U+XXXXXX if greater than uint16 but more on that later]). So in summary the letter г in Cyrillic ([gueu]) has code point value U+0433 which in decimal form is 1075. I need the value 1075 to do a lookup in the cmap table.
// utility function for output
void hex_print(const std::string& s)
{
std::cout << std::hex << std::setfill('0');
uint16_t i = 0, dec;
for(unsigned char c : s) {
std::cout << std::setw(2) << static_cast<int>(c) << ' ';
dec = (i++ % 2 == 0) ? (c << 8) : (dec | c);
printf("Unicode Value: U+%04x Decimal value of code point: %d\n", codePoint, codePoint);
}
}
std::string is encoding-agnostic. It essentially stores bytes. std::wstring is weird, though also not defined to hold any specific encoding. In Windows, wchar_t is used for UTF-16
Yes exactly, I think when you understand that "while" you think (at least I did) that strings were just storing "ASCII" characters (hold on here), this appears to be really wrong. In fact std::string as suggested by the comment only seems to store 'bytes'. Though clearly if you look at the bytes of the string english you get:
std::string eng = "english";
hex_print(eng);
65 6e 67 6c 69 73 68
and if you do the same thing with "сапоги you get:
std::string cyrillic = "сапоги";
hex_print(cyrillic );
d1 81 d0 b0 d0 bf d0 be d0 b3 d0 b8
What I'd really like to know/understand is how is this conversion implicitly done? Why UTF-8 encoding here rather the UTF-16 and is there a possibility of changing that that (or is that defined by my IDE or OS?)? Clearly when I copy paste the string сапоги in my text editor, it actually copies an array of 12 bytes already (these 12 bytes could be utf-8 or utf-16).
I think there is a confusion between Unicode and encoding. Codepoint (AFAIK) is just a character code. UTF 16 gives you the code, so you can say your 0x0441 is a с codepoint in case of Cyrillic small letter es. To my understanding UTF16 maps one-to-one with Unicode codepoint which have a range of 1M and something characters. However, other encoding techniques, for example UTF-8 does not maps directly to Unicode codepoint. So, I guess, you better stick to the UTF-16
Exactly! I found this comment very useful indeed. Because yes, there is confusion (and I was confused) with regards to the fact that the way you encode the Unicode code point value has nothing to do with the Unicode value itself, well sort of because in fact things can be misleading as I will show now. You can indeed encode the string сапоги using UTF8 and you will get:
d1 81 d0 b0 d0 bf d0 be d0 b3 d0 b8
So clearly it has nothing to do with the Unicode values of the glyphs indeed. Now if you encode the same string using UTF16 you get:
04 41 04 30 04 3f 04 3e 04 33 04 38
Where 04 and 41 are indeed the two bytes (in Hexadecimal form) of the letter с ([se] in cyrillic). In this case at least, there is a direct mapping between the unicode value and its uint16 representation. And this is why (per Wiki's explanation [source]):
Both UTF-16 and UCS-2 encode code points in this range as single 16-bit code units that are numerically equal to the corresponding code points.
But as someone suggested in the comment, some code points values go beyond what you can define with 2 bytes. For example:
1D307 𝌇 TETRAGRAM FOR FULL CIRCLE (Tai Xuan Jing Symbols)
which is what this comment was suggesting:
To my knowledge, UTF-16 doesn't cover all characters unless you use surrogate pairs. It was meant to originally, when 65k was more than enough, but that went out the window, making it an extremely awkward choice now
Though to be perfectly exact UTF-16 like UTF-8 CAN encode ALL characters though it can use up to 4 bytes for doing so (as you suggested it would use surrogate pairs if more than 2 bytes are needed).
I tried to do a conversion to UTF-32 using mbrtoc32 but cuchar is strangely missing on Mac.
BTW, if you don't know what a surrogate pair is (I didn't) there's a nice post about this on the forum.
For your purposes, finding and printing the value of each character, you probably want to use char32_t, because that has no multi-byte strings or surrogate pairs and can be converted to decimal values just by casting to unsigned long. I would link to an example I wrote, but it sounds as if you want to solve this problem yourself.
C++14 directly supports the types char8_t, char16_t and char32_t, in addition to the legacy wchar_t that sometimes means UCS-32, sometimes UTF-16LE, sometimes UTF-16BE, sometimes something different. It also lets you store strings at runtime, no matter what character set you saved your source file in, in any of these formats with the u8", u" and U" prefixes, and the \uXXXX unicode escape as a fallback. For backward compatibility, you can encode UTF-8 with hex escape codes in an array of unsigned char.
Therefore, you can store the data in any format you want. You could also use the facet codecvt<wchar_t,char,mbstate_t>, which all locales are required to support. There are also the multi-byte string functions in <wchar.h> and <uchar.h>.
I highly recommend you store all new external data in UTF-8. This includes your source files! (Annoyingly, some older software still doesn’t support it.) It may also be convenient to use the same character set internally as your libraries, which will be UTF-16 (wchar_t) on Windows. If you need fixed-length characters that can hold any codepoint with no special cases, char32_t will be handy.
Originally computers were designed for the American market and used Ascii - the American code for information interchange. This had 7 bit codes, and just the basic English letters and a few punctuation marks, plus codes at the lower end designed to drive paper and ink printer terminals.
This became inadequate as computers developed and started to be used for language processing as much as for numerical work. The first thing that happened was that various expansions to 8 bits were proposed. This could either cover most of the decorated European characters (accents, etc) or it could give a series of basic graphics good for creating menus and panels, but you couldn't achieve both. There was still no way of representing non-Latin character sets like Greek.
So a 16-bit code was proposed, and called Unicode. Microsoft adopted this very early and invented the wchar WCHAR (it has various identifiers) to hold international characters. However it emerged that 16 bits wasn't enough to hold all glyphs in common use, also the Unicode consortium intoducuced some minor incompatibilities with Microsoft's 16-bit code set.
So Unicode can be a series of 16-bit integers. That's wchar string. Ascii text now has zero characters between in the high bytes, so you can't pass a wide string to a function expectign Ascii. Since 16 bits was nearly but not quite enough, a 32 bit unicode set was also produced.
However when you saved unicode to a file, this created problems, was it 16 bit of 32 bit> And was it big-endian or little-endian. So a flag at the start of the data was proposed to remedy this. The problem was that the file contents, memorywise, no longer match the string contents.
C++ std:;string was templated so it could use basic chars or one of the wide types, almost always in practice Microsoft's 16 bit near-unicode encoding.
The UTF-8 was invented to come to the rescue. This a multi-byte variable length encoding, which uses the fact that ascii is only 7 bits. So if the high bit is set, it means that you have two, three, or four bytes in the character. Now a very large number of string are English language or mainly human-readable numbers, so essentially ascii. These strings are the same in Ascii as in UTF-8, which mkaes life a whole lot easier. You have no byte order convention problems. You do have the problem that you must decode the UTF-8 to code points with as not entirely trivial function, and remember to advance your read position by the correct number of bytes.
UTF-8 is really the answer, but the other encodings are still in use and you will come across them.

comparing two doubles in the latest visual studio [duplicate]

After upgrading C++ project to Visual studio 2013, the result of the program has changed because of different floating point behavior of the new VC compiler. The floating model is set to /fp:precise
In Visual Studio 2008(v9.0)
float f = 0.4f; //it produce f = 0.400000001
float f6 = 0.400000006f; //it produce f = 0.400000001
In Visual Studio 2013 (v12.0)
float f = 0.4f; //it produce f = 0.400000006
float f1 = 0.40000001f; //it produce f1 = 0.400000006
The setting for the project is identical (converted).
I understand that there is a kind of liberty in floating point model, but I don't like that certain things has changed and the poeple working with old/new version of Visual Studio can't reproduce some bugs being reported by another developers. Is there any setting which can be changed to enforce the same behavior across different version of Visual Studio?
I tried to set platform toolset to vs90 and it still produce 0.400000006 in VS2013.
UPDATE:
I tracked the hexadicimal values in Memory Window. The hexadecimal values, f1, f1 and f6 are all the same. There is a difference in displaying these float values in the watch window. Futhermore the problem is still with float 0.3f. Multiplying same decimal values gives different result.
In Visual Studio 2008(v9.0)
float a = 0.3f; //memory b8 1e 85 3e 00 00 40 40, watch 0.25999999
float b = 19400;
unsigned long c = (unsigned long)((float)b * a); //5043
In Visual Studio 2013 (v12.0)
float a = 0.3f; //memory b8 1e 85 3e 00 00 40 40, watch 0.259999990
float b = 19400;
unsigned long c = (unsigned long)((float)b * a); //5044
The behavior is correct, the float type can store only 7 significant digits. The rest are just random noise. You need to fix the bug in your code, you are either displaying too many digits, thus revealing the random noise to a human, or your math model is losing too many significant digits and you should be using double instead.
There was a significant change in VS2012 that affect the appearance of the noise digits. Part of the code generator changes that implement auto-vectorization. The 32-bit compiler traditionally used the FPU for calculations. Which is very notorious for producing different random noise, calculations are performed with an 80-bit intermediate format and get truncated when stored back to memory. The exact moment when this truncation occurs can be unpredictable due to optimizer choices, thus producing different random noise.
The code generator, like it already did for 64-bit code, now uses SSE2 instructions instead of FPU instructions. Which produce more consistent results that are not affected by the code optimizer choices, the 80-bit intermediate format is no longer used. A backgrounder on the trouble with the FPU is available in this answer.
This will be the behavior going forward, the FPU will never come back again. Adjust your expectations accordingly, there's a "new normal". And fix the bugs.

How to isolate a fault on arm device?

I am using rapidjson on an Arm device and get strange behaviour, when running this code.
#include <document.h>
using namespace std;
int main()
{
const char json []="[{\"Type\":\"float\",\"val_param\" : 12.025 }]";
rapidjson::Document d;
if( d.Parse<0>( json ).HasParseError() ) {
//ErrorCase
}else{
rapidjson:: Value& val_param = d[0]["val_param"];
double tmp_double1 = val_param.GetDouble();
cout << tmp_double1 <<endl; // -9.2559641157289301e+61 instead of 12.025
}
return 0;
}
Before down voting this question. What else information do you need? I really don't know how to isolate this fault. If it occurs because of the embedded device, or rapidjson. And how to solve it.
========================== UPDATE ========================================
What is the device? http://www.keith-koep.com/de/produkte/produkte-trizeps/trizeps-iv-m-eigenschaften/
Does it have a hardware FPU? It is ARMv5 so I don't think so.
What compiler and libraries are you using (version numbers/specific builds)?
What options are you passing to the compiler and linker?
arm-linux-gnueabi-g++ -march=armv5te -marm -mthumb-interwork --sysroot=/usr/local/oecore-x86_64/sysroots/armv5te-linux-gnueabi
This looks like it might be an undefined-behaviour-type bug in RapidJSON.
Since you're targeting ARMv5, you're probably using a software floating-point library using the legacy ARM FPA format (as opposed to the later VFP, which uses IEEE754 format). Crucially, the FPA stores things in a weird middle-endian format, where 64-bit doubles are stored as two little-endian words, but most-significant word first.
(Yes, big-endian ARM is a whole other complicated issue, but I'm deliberately ignoring it here since I don't see an armeb-* triplet or the -mbig-endian option anywhere)
Consider 12.025 as an IEEE754 double:
64-bit value: 0x40280ccccccccccd.
little-endian byte order: cd cc cc cc cc 0c 28 40
as little-endian words: 0xcccccccd 0x40280ccc
Now in FPA format that would be:
as little-endian words: 0x40280ccc 0xcccccccd
byte order: cc 0c 28 40 cd cc cc cc
Trying to interpret that as a pure little-endian 64-bit value yields 0xcccccccd40280ccc, which just so happens to be the IEEE754 representation of -9.255965e+61. Fancy that!
From a quick look around the code, it may strictly be more of an incompatibility than a bug, since RapidJSON does seem to explicitly assume IEEE754 format for floating-point values. To its credit, even though the parsing code looks pretty hairy, I do see unions rather than type-punning pointers. However, even if it's not relying on undefined behaviour, it's still relying on implementation-defined behaviour (the format of floating-point types), and unfortunately this compiler's implementation doesn't match that expectation.

Serializing binary struct gcc vs cl

Full disclosure - this is homework, although completed and fully working, I'm searching for a nicer solution.
I have a binary file, which was created by a program compiled within Visual Studio (I believe). The structure looks something like this.
struct Record {
char c;
double d;
time_t t;
};
The size of this structure on Windows with Visual Studio 2008 gives 24 bytes. 1 + 8 + 8 = 24. So there's some padding going on. The same structure on Linux and gcc gives 16 bytes. 1 + 8 + 4 = 16. To line this up I added some padding and changed time_t to another type. So then my struct looks like this.
struct Record {
char c;
char __padding[7];
double d;
long long t;
};
This now works and gcc gives its size as 24 bytes, but it seems a little dirty. So two questions..
Why is this implemented differently between the two compilers?
Are there any __attribute__ ((aligned))-type options or any other cleaner solutions for this?
The difference stems from whether we 32bit align doubles by default or 64bit align doubles by default. On a 32 bit machine, having a double on a 64 bit boundary may have some benefits but is probably not huge. VC is then probably more careful about this than gcc.
The botton line is that if you are using structs for serialization you should ALWAYS make them packed (ie 8 bit aligned) and then do the alignment by hand. This way your code is sure to be compatible across platforms.