I am using VS 2012 and programming in C++. I have a wide string
wchar_t *str = L"Hello world".
Technically I read the string from a file but I don't know if that makes a difference. When I look at str in the memory window it looks like this:
00 48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00
As you can see the string is stored in memory as big-endian.
When I hover my mouse over the string I get:
L"䠀攀氀氀漀Ⰰ 眀漀爀氀搀℀"
And after I reverse the endianness of str the memory looks like:
48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00 00
And the hover over looks like:
L"Hello, world!"
It seems that the debugger displays UTF-16 in little-endian by default. My program reads big-endian files so it is very tedious to keep reversing the endianness of all strings to debug them. Is there any way to change the endianness of the debugger's display?
Except for debug purposes I can do all my processing in big endian.
It's not only the debugger. The wchar_t function of Visual Studio are little endian as the host is. When you want to process the data you need to reverse the string endianess to little endian anyway.
It's worth to have this change even if you output the strings to a file with a different endianess. Strings are defined as a byte sequence, your endianess applied to a string looks strange anyhow.
Your best shot in getting this to work is to define your own type and create a debugger type visualizer for it (see Customizing the Visual Studio Debugger Display of Your Data, or here).
Or maybe you can quick-hack it by shifting the address by 1 byte in watch window.
You're working with a non-native string format that just happens to "feel" similar to the native format. So you are tempted to think there should be almost a way to do it. But to the debugger, it's just a foreign binary format. The debugger is not designed to handle foreign endianness just as it does not handle visualizing an OGG stream packet.
If you want to use available tools for manipulating native-endian Unicode strings, you'll need to convert to native-endian Unicode format.
As has been pointed out, VS uses the native endianness, which is
little endian on an Intel/AMD. The problem is that you're not
reading the strings correctly; you should imbue the
std::istream with a locale which reads UTF-16BE (since this is
apparently the encoding form you're trying to read).
std::istream (or rather the backing std::filebuf) will
automatically do the code translation on the fly when reading
and writing.
You can set the endianness of the Memory window using the context menu. Right-click in the Memory window and check "Big Endian".
Related
Context: I am expanding the old console application that is used to SQL query an SQL server and to store the SELECT result into the DBF table. The application is written in the native C++. It includes oledb.h and uses the documented ATL objects (atldbcli.h)
CCommand<CManualAccessor, CRowset> cmd;
CTable<CManualAccessor, CRowset> dstTable;
Then the common buffer is allocated and shared later by the source table and the dstTable. The suitable binding of columns to the buffer parts does the conversion when copying to the destination table. So far, so good.
What I need: I need to implement the extension where the source table content is interpreted. The values of a row from the source table should be used to build another SQL SELECT command. No problem with strings. However, I need to get the value of the field that is defined as NUMERIC(20, 0) in the database. The column type is really of DBTYPE_NUMERIC, the colInfo.bPrecision shows 20 and colInfo.bScale is zero. The lenght of the part of the buffer is 19 bytes
Buffer content: The buffer for the field shows the value
04 00 01 10 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00
I know that the exact value of the field should be 7440 that is 0x1d10. I can see it from the fourth byte of the buffer on. For the quick&dirty hack I cand get the value; however, I would like to understand the details to implement it nicely for any NUMERIC(x, y)...
What tell the first three bytes? Are there any ready-to-be-used functions in oledb.h or atldbcli.h to get the value?
Consider a device in the system, something under /dev/hdd[sg][nvme]xx
Open the device, get the file descriptor and start working with it (read(v)/write(v)/lseek, etc), at some point you may get EIO. How do you retrieve the underlying error reported by the device driver?
EDIT001: in case it is impossible using unistd functions, maybe there is other ways to work with block devices which can provide more low-level information like sg_scsi_sense_hdr?
You can't get any more error detail out of the POSIX functions. You're onto the right track with the SCSI generic stuff though. But, boy, it's loaded with hair. Check out the example in sg3_utils of how to do a SCSI READ(16). This will let you look at the sense data when it comes back:
https://github.com/hreinecke/sg3_utils/blob/master/examples/sg_simple16.c
Of course, this technique doesn't work with NVMe drives. (At least, not to my knowledge).
One concept I've played with in the past is to use normal POSIX/libc block I/O functions like pread and pwrite until I get an EIO out. At that point, you can bring in the SCSI-generic versions to try to figure out what happened. In the ideal case, a pread or lseek/read fails with EIO. You then turn around and re-issue it using a SG READ (10) or (16). If it's not just a transient failure, this may return sense data that your application can use.
Here's an example, using the command-line sg_read program. I have an iSCSI attached disk that I'm reading and writing. On the target, I remove its LUN mapping. dd reports EIO:
# dd if=/dev/sdb of=/tmp/output bs=512 count=1 iflag=direct
dd: error reading ‘/dev/sdb’: Input/output error
but sg_read reports some more useful information:
[root#localhost src]# sg_read blk_sgio=1 bs=512 cdbsz=10 count=512 if=/dev/sdb odir=1 verbose=10
Opened /dev/sdb for SG_IO with flags=0x4002
read cdb: 28 00 00 00 00 00 00 00 80 00
duration=9 ms
reading: SCSI status: Check Condition
Fixed format, current; Sense key: Illegal Request
Additional sense: Logical unit not supported
Raw sense data (in hex):
70 00 05 00 00 00 00 0a 00 00 00 00 25 00 00 00
00 00
sg_read: SCSI READ failed
Some error occurred, remaining block count=512
0+0 records in
You can see the Logical unit not supported additional sense code in the above output, indicating that there's no such LU at the target.
Possible? Yes. But as you can see from the code in sg_simple16.c, it's not easy!
I'm trying to send data from Qt to R. I am new to the QtNetwork module and relatively new to Qt overall. As such I am also trying to figure out how QIODevice encodes data for the purposes of reading and writing.
If I run the Fortune Server Example and connect to it with the following code in R:
connection <- socketConnection(host="localhost", port=50743, open="rb", timeout=10)
readBin(connection, what="raw", n = 1000)
the following raw hexadecimal vector is returned
00 00 00 56 00 59 00 6f 00 75 00 20 00 77 00 69 00 6c 00 6c 00 20 00 66 00 65 00 65 00 6c 00 20 00 68 00 75 00 6e 00 67 00 72 00 79 00 20 00 61 00 67 00 61 00 69 00 6e 00 20 00 69 00 6e 00 20 00 61 00 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 68 00 6f 00 75 00 72 00 2e
Removing the first five bytes and all the remaining null characters and converting to char I get:
"You will feel hungry again in another hour."
So what I want to know is where do all the characters that are not part of the fortune come from? The fourth byte seems to be the byte length of the message from the sixth byte to the end, the rest of the "non-fortune" characters are all null.
I read that QByteArray terminates each byte with a null character and QByteArray is converted to a QBuffer before being written by QTcpSocket, is that what is happening here? QBuffer adds the length of the message (but what of the other four bytes) and every second byte of a QByteArray is the null character? Also, the last byte is not null (did the readBin operation consume it/ how did readBin know where the message ended)?
Is this the only way to write data to the socket? If I wanted to transmit values of type double would I have to convert them to QByteArray to transmit them in this fashion? Is there not some non-text way of transmitting data through a socket?
Any enlightenment would be much appreciated!
EDIT:
Thanks for the answer! For completeness sake here is how you might decode the string in R
connection <- socketConnection(host="localhost", port=50743, open="rb", timeout=10)
# Read first 32 bits, which contains the size of the string in bytes
len.raw <- readBin(connection, what="raw", n = 4)
# convert to integer
len <- strtoi(paste(c("0x",len.raw),collapse=""))
# Read raw message
msg.raw <- readBin(connection, what="raw", n = len)
# convert to char using UTF-16BE
msg <- iconv(list(msg.raw),from="UTF-16BE")
close(connection)
cat(msg)
If you take a look at how the Fortune Server Example is implemented, you can see that it uses QDataStream to serialize fortunes (QStrings) over the socket:
QByteArray block;
QDataStream out(&block, QIODevice::WriteOnly);
out.setVersion(QDataStream::Qt_4_0);
out << fortunes.at(qrand() % fortunes.size());
So, the question is reduced to "How does QDataStream serialize QStrings?", and this is answered extensively in the documentation page about serializing Qt data types. You can see that a QString's serialization looks like this:
If the string is null: 0xFFFFFFFF (quint32)
Otherwise: The string length in bytes (quint32) followed by the data in UTF-16
And this is exactly what you are seeing in your question. The first four bytes are the string length in bytes, and the "nulls" you are seeing later appear because of using UTF-16 encoding.
Is this the only way to write data to the socket? If I wanted to transmit values of type double would I have to convert them to QByteArray to transmit them in this fashion? Is there not some non-text way of transmitting data through a socket?
You can use any serialization format you like. QDataStream is widely used in Qt since it supports most Qt data types out of the box. This has nothing to do with using QByteArray, you can let QDataStream write to the socket directly. QDataStream is, actually, a binary format (non-text) as you can see. If want textual human-readable formats, you can use JSON.
But if you are aiming to send data from Qt to R using QDataStream, you'll have to write your QDataStream deserializer for R. I would recommend using some common data serialization that has implementations in C++ and R (in lieu of re-inventing the wheel). I believe JSON meets this criterion, and if you want to use a binary format, msgpack might be interesting for you, since it supports a lot of programming languages (including R and C++).
I'm making use of wxWidgets in my program for directory management and compressing/uncompromising collections of files. As I've been building my file system, I've noticed that I get memory leaks every run. After a lot of testing, I realized that any time I use any functions related to wxFileName, I get a memory leak. I'm using wx widgets 3.0.1, and my standalone example is as follows.
#include <wx\filename.h>
int main()
{
wxFileName::Mkdir("Test");
return 0;
}
The result is the same if I make an instance of the wxFileName class.
How do I make wx widgets not create a memory leak? I want to be able to package large collections of files in one file, and read the data from them with various other libraries (via extracting the zip to a temporary folder and reading the data from there). I haven't been able to get any other library to zip/unzip entire folders, so I really need to be able to use wxWidgets without a memory leak.
I read in another thread that the visual studios debugger is falsely identifying the memory leaks, but I ran it through AQtime and it confirmed that there was indeed a memory leak.
The exact debug output involving the memory leak is as follows:
Detected memory leaks!
Dumping objects ->
{1087} normal block at 0x009B4BC0, 64 bytes long.
Data: <\+= d+= l+= t+= > 5C 2B 3D 00 64 2B 3D 00 6C 2B 3D 00 74 2B 3D 00
{1086} normal block at 0x009B4880, 772 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{1085} normal block at 0x009B4680, 28 bytes long.
Data: < H > 80 48 9B 00 C1 00 00 00 00 00 00 00 CD CD CD CD
Object dump complete.
After a bit of digging (it WOULD be the digging I did AFTER posting the question) I found that when you're using wxWidgets without creating a wxWidgets app object, you need to use the following two functions:
wxInitialize()
and
wxUninitialize()
So the fixed version of my code is as follows:
#include <wx/app.h>
#include <wx\filename.h>
int main()
{
wxInitialize();
wxFileName::Mkdir("Waka Waka");
wxUninitialize();
return 0;
}
I suggest if anyone is using wxWidgets purely for the file management to either call these functions in the constructor and destructor of whatever class handles files, or at the beginning and end of your program's main loop.
Does any one know if there is a way to dump only a chunk of memory to disk using VS? Basically, I want to give it an address and a length, and have it write the memory to disk. That way I can do a binary diff.
Thanks.
I'm kind of surprised VS won't let you do that from the Memory dump window...
You might be able to get what you want (or close to it) with the VS command window:
>Tools.LogCommandWindowOutput c:\temp\testdump.log /overwrite
>Debug.ListMemory /Count:16 0x00444B20
0x00444B20 00 00 00 00 00 00 00 00 13 00 12 00 86 07 19 00 ................
>Tools.LogCommandWindowOutput /off
If you're willing to use WinDBG, (or ntsd/cdb) you can use the .writemem debugger command to do exactly what you want.
I believe you can only save a complete binary minidump. However, you can use the Debug Memory window and copy/paste to a text file to do memory diffs.
OK, this I have tried in VS 2008, but I believe VS 2005 should allow the same:
If the memory is a string (if it doesn't contain zero bytes), you can put the following into a watch window: (unsigned char*)(ptr),1024 to see 1kB in the text visualizer. However, this stops at zero bytes, so if you have binary data, this won't work.