I'm writing an application that needs to uncompress data compressed by another application (which is outside my control - I cannot make changes to it's source code). The producer application uses zlib to compress data using the z_stream mechanism. It uses the Z_FULL_FLUSH frequently (probably too frequently, in my opinion, but that's another matter). This third party application is also able to uncompress it's own data, so I'm pretty confident that the data itself is correct.
In my test, I'm using this third party app to compress the following simple text file (in hex):
48 65 6c 6c 6f 20 57 6f 72 6c 64 21 0d 0a
The compressed bytes I receive from the app look like this (again, in hex):
78 9c f2 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 00 00 ff ff
If I try and compress the same data, I get very similar results:
78 9c f3 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 24 e9 04 55
There are two differences that I can see:
First, the fourth byte is F2, rather than F3, so the deflate "final block" bit has not been set. I assume this is because the stream interface never knows when the end of the incoming data will be, so never sets that bit?
Finally, the last four bytes in the external data is 00 00 FF FF, whereas in my test data it is 24 E9 04 55. Searching around I found on this page
http://www.bolet.org/~pornin/deflate-flush.html
...that this is a signature of a sync or full flush.
When I try and decompress my own data using the decompress() function, everything works perfectly. However, when I try and decompress the external data the decompress() function call fails with a return code of Z_DATA_ERROR, indicating corrupt data.
I have a few questions:
Should I be able to use the zlib "uncompress" function to uncompress data that has been compressed with the z_stream method?
In the example above, what is the significance of the last four bytes? Given that both the externally compressed data stream and my own test data stream are the same length, what do my last four bytes represent?
Cheers
Thanks to the zlib authors, I have found the answer. The third party app is generating zlib streams that are not finished correctly:
78 9c f2 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 00 00 ff ff
That is a partial zlib stream,
consisting of a zlib header and a
partial deflate stream. There are two
blocks, neither of which is a last
block. The second block is an empty
stored block, used as a marker when
flushing. A zlib decoder would
correctly decode what's there, and
then continue to look for data after
those bytes.
78 9c f3 48 cd c9 c9 57 08 cf 2f ca 49 51 e4 e5 02 00 24 e9 04 55
That is a complete zlib stream,
consisting of a zlib header, a single
block marked as the last block, and a
zlib trailer. The trailer is the
Adler-32 checksum of the uncompressed
data.
So My decompression is failing - probably because the CRC is missing, or the decompression code keeps looking for more data that does not exist.
solution is here:
http://technology.amis.nl/2010/03/13/utl_compress-gzip-and-zlib/
this is decompression and compression functions for start with 78 9C signature
compressed database blob (or stream).
Related
I'm trying to send data from Qt to R. I am new to the QtNetwork module and relatively new to Qt overall. As such I am also trying to figure out how QIODevice encodes data for the purposes of reading and writing.
If I run the Fortune Server Example and connect to it with the following code in R:
connection <- socketConnection(host="localhost", port=50743, open="rb", timeout=10)
readBin(connection, what="raw", n = 1000)
the following raw hexadecimal vector is returned
00 00 00 56 00 59 00 6f 00 75 00 20 00 77 00 69 00 6c 00 6c 00 20 00 66 00 65 00 65 00 6c 00 20 00 68 00 75 00 6e 00 67 00 72 00 79 00 20 00 61 00 67 00 61 00 69 00 6e 00 20 00 69 00 6e 00 20 00 61 00 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 68 00 6f 00 75 00 72 00 2e
Removing the first five bytes and all the remaining null characters and converting to char I get:
"You will feel hungry again in another hour."
So what I want to know is where do all the characters that are not part of the fortune come from? The fourth byte seems to be the byte length of the message from the sixth byte to the end, the rest of the "non-fortune" characters are all null.
I read that QByteArray terminates each byte with a null character and QByteArray is converted to a QBuffer before being written by QTcpSocket, is that what is happening here? QBuffer adds the length of the message (but what of the other four bytes) and every second byte of a QByteArray is the null character? Also, the last byte is not null (did the readBin operation consume it/ how did readBin know where the message ended)?
Is this the only way to write data to the socket? If I wanted to transmit values of type double would I have to convert them to QByteArray to transmit them in this fashion? Is there not some non-text way of transmitting data through a socket?
Any enlightenment would be much appreciated!
EDIT:
Thanks for the answer! For completeness sake here is how you might decode the string in R
connection <- socketConnection(host="localhost", port=50743, open="rb", timeout=10)
# Read first 32 bits, which contains the size of the string in bytes
len.raw <- readBin(connection, what="raw", n = 4)
# convert to integer
len <- strtoi(paste(c("0x",len.raw),collapse=""))
# Read raw message
msg.raw <- readBin(connection, what="raw", n = len)
# convert to char using UTF-16BE
msg <- iconv(list(msg.raw),from="UTF-16BE")
close(connection)
cat(msg)
If you take a look at how the Fortune Server Example is implemented, you can see that it uses QDataStream to serialize fortunes (QStrings) over the socket:
QByteArray block;
QDataStream out(&block, QIODevice::WriteOnly);
out.setVersion(QDataStream::Qt_4_0);
out << fortunes.at(qrand() % fortunes.size());
So, the question is reduced to "How does QDataStream serialize QStrings?", and this is answered extensively in the documentation page about serializing Qt data types. You can see that a QString's serialization looks like this:
If the string is null: 0xFFFFFFFF (quint32)
Otherwise: The string length in bytes (quint32) followed by the data in UTF-16
And this is exactly what you are seeing in your question. The first four bytes are the string length in bytes, and the "nulls" you are seeing later appear because of using UTF-16 encoding.
Is this the only way to write data to the socket? If I wanted to transmit values of type double would I have to convert them to QByteArray to transmit them in this fashion? Is there not some non-text way of transmitting data through a socket?
You can use any serialization format you like. QDataStream is widely used in Qt since it supports most Qt data types out of the box. This has nothing to do with using QByteArray, you can let QDataStream write to the socket directly. QDataStream is, actually, a binary format (non-text) as you can see. If want textual human-readable formats, you can use JSON.
But if you are aiming to send data from Qt to R using QDataStream, you'll have to write your QDataStream deserializer for R. I would recommend using some common data serialization that has implementations in C++ and R (in lieu of re-inventing the wheel). I believe JSON meets this criterion, and if you want to use a binary format, msgpack might be interesting for you, since it supports a lot of programming languages (including R and C++).
I've been experimenting with Pican2 and the python-can libraries and I've been able to read the bus and interpret many messages in my car. The problem is when I send a message to the bus (for example, turn on A/C), it quickly appears once in the candump printout and then reverts to its previous state. For example:
[436] 00 08 00 10 FE 00 00 01
[436] 04 10 00 10 FE 00 00 01
[436] 00 08 00 10 FE 00 00 01
[436] 00 08 00 10 FE 00 00 01
...
04 10 occur when A/C is on and fan speed is at level 1. I am sending this data... 00 08 is A/C is off, this overrides my can message on its own.
It seems as though I have to send the message in a loop for it to take. Is there something I am missing? I feel like I should just be able to send the message once and have the canbus accept it.
Many functions controlled by CAN messages need periodic messages to keep them doing what they are doing. In your log, it looks like there are periodic messages controlling the fan.
Further reading: https://www.sans.org/reading-room/whitepapers/threats/hacking-bus-basic-manipulation-modern-automobile-through-bus-reverse-engineering-37825
I have a UDP packet which is embedded inside IP packet and not able to calculate the checksum of UDP properly but I can correctly find the CHecksum of IP. Can someone help how the UDP checksum is found.
[45 00 00 53 00 80 00 00 40 11 66 16 0A 00 00 03 0A 00 00 02] CA B1 CA B1 00 3F DF A5
The bits enclosed in bracket is IP packet and the checksum is given in bold.
**UDP Packet**
CA B1 Source port
CA B1 Destination port
00 3F Length
DF A5 Checksum
Here how the checksum "DF A5" came. I did 16 bit addition and took the 1s complement but still not getting the value. Whether I need to consider IP header also to calculate the Checksum of UDP
I am using VS 2012 and programming in C++. I have a wide string
wchar_t *str = L"Hello world".
Technically I read the string from a file but I don't know if that makes a difference. When I look at str in the memory window it looks like this:
00 48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00
As you can see the string is stored in memory as big-endian.
When I hover my mouse over the string I get:
L"䠀攀氀氀漀Ⰰ 眀漀爀氀搀℀"
And after I reverse the endianness of str the memory looks like:
48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00 00
And the hover over looks like:
L"Hello, world!"
It seems that the debugger displays UTF-16 in little-endian by default. My program reads big-endian files so it is very tedious to keep reversing the endianness of all strings to debug them. Is there any way to change the endianness of the debugger's display?
Except for debug purposes I can do all my processing in big endian.
It's not only the debugger. The wchar_t function of Visual Studio are little endian as the host is. When you want to process the data you need to reverse the string endianess to little endian anyway.
It's worth to have this change even if you output the strings to a file with a different endianess. Strings are defined as a byte sequence, your endianess applied to a string looks strange anyhow.
Your best shot in getting this to work is to define your own type and create a debugger type visualizer for it (see Customizing the Visual Studio Debugger Display of Your Data, or here).
Or maybe you can quick-hack it by shifting the address by 1 byte in watch window.
You're working with a non-native string format that just happens to "feel" similar to the native format. So you are tempted to think there should be almost a way to do it. But to the debugger, it's just a foreign binary format. The debugger is not designed to handle foreign endianness just as it does not handle visualizing an OGG stream packet.
If you want to use available tools for manipulating native-endian Unicode strings, you'll need to convert to native-endian Unicode format.
As has been pointed out, VS uses the native endianness, which is
little endian on an Intel/AMD. The problem is that you're not
reading the strings correctly; you should imbue the
std::istream with a locale which reads UTF-16BE (since this is
apparently the encoding form you're trying to read).
std::istream (or rather the backing std::filebuf) will
automatically do the code translation on the fly when reading
and writing.
You can set the endianness of the Memory window using the context menu. Right-click in the Memory window and check "Big Endian".
We are migrating our C++ source from VS2008 to VS 2010. We are having issues due to incorrect lib files.
Is there any way to determine whether a lib file is build using VS 2010 or VS 2008?
Strictly speaking, You won't be able to get it from the lib file directly since those are just a container for .obj files (or 'pseudo object files in the case of import libraries). It's possible to have a library that contains object files created by different compilers, though I doubt you'll see that very often, if ever.
However, you may be able to coax the information out of the object files contained in the library.
I don't know how reliable this information is, but it appears that object files produced by MSVC contain version information about the compiler used to build them. The object file contains a section with the name ".debug$S", which will contain debugging information. However, even if you've built the object file without debugging information, there will still be a small ".debug$S" section, which might look like the following for a simple 'hello world' program compiled with VS 2008 SP1 (Compiler Version 15.00.30729.01):
RAW DATA #2
00000000: 04 00 00 00 F1 00 00 00 56 00 00 00 18 00 01 11 ....ñ...V.......
00000010: 00 00 00 00 63 3A 5C 74 65 6D 70 5C 68 65 6C 6C ....c:\temp\hell
00000020: 6F 2E 6F 62 6A 00 3A 00 3C 11 00 22 00 00 07 00 o.obj.:.<.."....
00000030: 0F 00 00 00 09 78 01 00 0F 00 00 00 09 78 01 00 .....x.......x..
00000040: 4D 69 63 72 6F 73 6F 66 74 20 28 52 29 20 4F 70 Microsoft (R) Op
00000050: 74 69 6D 69 7A 69 6E 67 20 43 6F 6D 70 69 6C 65 timizing Compile
00000060: 72 00 00 00 r...
Note that if you convert the components of the compiler version, 15.00.30729.01, to 16-bit hex numbers, you'll get (displayed in little endian):
0f 00 00 00 09 78 01 00
Which is a string you'll notice shows up twice in the ".debug$S" section at offsets 0x30 and 0x38.
For VS 2010 SP1 (Compiler version 16.00.40219.01) produces the following ".debug$S":
RAW DATA #2
00000000: 04 00 00 00 F1 00 00 00 56 00 00 00 18 00 01 11 ....ñ...V.......
00000010: 00 00 00 00 43 3A 5C 74 65 6D 70 5C 68 65 6C 6C ....C:\temp\hell
00000020: 6F 2E 6F 62 6A 00 3A 00 3C 11 00 22 00 00 07 00 o.obj.:.<.."....
00000030: 10 00 00 00 1B 9D 01 00 10 00 00 00 1B 9D 01 00 ................
00000040: 4D 69 63 72 6F 73 6F 66 74 20 28 52 29 20 4F 70 Microsoft (R) Op
00000050: 74 69 6D 69 7A 69 6E 67 20 43 6F 6D 70 69 6C 65 timizing Compile
00000060: 72 00 00 00 r...
where you'll note the compiler version data 10 00 00 00 1B 9D 01 00.
Similar signatures are produced by VS 2003 through VS 2012 compilers (VC6 does not produce a ".debug$S" section, and I don't have the means to test VS 2002). However, the offsets of the information differ at times (and may differ even for the same compiler depending on the actual options used and file being compiled).
I'm unaware of a tool that will easily extract this information, but some scripts that string together the lib tool and/or dumpbin could probably be cobbled together pretty easily. Microsoft's "PE and COFF Specification" document may be of some help if you want to pull apart libraries and object files yourself, though the document had no real information about the .debug$S section other than that it exists and contains debugging information.
Note that as far as I know this information is undocumented, and my reverse engineering of it is sketchy to say the least, and may change or not hold for all circumstances. I'm truly uncertain of how reliable this information is, but it's a start if no other better information shows up.