QDataStream and quint16 serialization weirdness - c++

I've written a small Packet class consisting of some quint8's, quint16's, QByteArrays and a quint32 that has a toByteArray() method to return a serialized version of the object, conforming to a protocol specification.
Packet Spec
Protocol identifier [4 byte] (Version 1 = 0x74697331)
Session ID, a MD5 hash salted with the timestamp + user IP. [16 bytes]
Command ID [1 byte]
Argument Size [2 byte]
Args [1-8,192 bytes]
CRC-B (X-25) [2 bytes]
Most of the data serializes fine. The exception is the last quint16 (my crc), which seems to get clobbered.
I don't believe the problem is in the declaration of my class. I've re-created the serialization function in this code sample which demonstrates the bug I'm receiving, without my packet class. (But the same final QByteArray layout)
(More) Minimal reproducible testcase
#include <iostream>
#include <QByteArray>
#include <QDebug>
using namespace std;
int main()
{
QByteArray arr;
quint32 m_protocolId = 0x74697331;
QByteArray m_sessionId;
m_sessionId.resize(16);
m_sessionId.fill(0);
quint8 m_commandId = 0x1;
quint16 m_argsSize = 0x0e;
QByteArray args;
args.append("test").append('\0');
args.append("1234qwer").append('\0');
quint16 m_crc;
m_crc = 0xB5A2;
QDataStream out(&arr, QIODevice::WriteOnly);
out.setByteOrder(QDataStream::LittleEndian);
out << m_protocolId;
out.writeRawData(m_sessionId.data(), 16);
out << m_commandId;
out << m_argsSize;
out.writeRawData(args.data(), args.length());
out << m_crc;
foreach (char c, arr)
{
qDebug() << QString::number((int)c, 16).toAscii().data();
}
return 0;
}
Here's the output I get:
73
69
74
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
e
0
74
65
73
74
0
31
32
33
34
71
77
65
72
0
ffffffffffffffa2
ffffffffffffffb5
Those last two should be 0xa2, 0xb5. I guess it's some sort of alignment issue. Is there any way to correct this while still conforming to the packet spec?

I think you just need to tweak your debug output. Try:
..
foreach (unsigned char c, arr)
..

Related

Why does Microsoft's implementation of std::string require 40 bytes on the stack?

Having recently watched this video about facebook's implementation of string, I was curious to see the internals of Microsoft's implementation. Unfortunately, the string file (in %VisualStudioDirectory%/VC/include) doesn't seem to contain the actual definition, but rather just conversion functions (e.g. atoi) and some operator overloads.
I decided to do some poking and prodding at it from user-level programs. The first thing I did, of course, was to test sizeof(std::string). To my surprise, std::string takes 40 bytes! (On 64-bit machines anyways.) The previously mentioned video goes into detail about how facebook's implementation only requires 24 bytes and gcc's takes 32 bytes, so this was shocking to say the least.
We can dig a little deeper by writing a simple program that prints off the contents of the data byte-by-byte (including the c_str address), as such:
#include <iostream>
#include <string>
int main()
{
std::string test = "this is a very, very, very long string";
// Print contents of std::string test.
char* data = reinterpret_cast<char*>(&test);
for (size_t wordNum = 0; wordNum < sizeof(std::string); wordNum = wordNum + sizeof(uint64_t))
{
for (size_t i = 0; i < sizeof(uint64_t); i++)
std::cout << (int)(data[wordNum + i]) << " ";
std::cout << std::endl;
}
// Print the value of the address returned by test.c_str().
// (Doing this byte-by-byte to match the above values).
const char* testAddr = test.c_str();
char* dataAddr = reinterpret_cast<char*>(&testAddr);
std::cout << "c_str address: ";
for (size_t i = 0; i < sizeof(const char*); i++)
std::cout << (int)(dataAddr[i]) << " ";
std::cout << std::endl;
}
This prints out:
48 33 -99 -47 -55 1 0 0
16 78 -100 -47 -55 1 0 0
-52 -52 -52 -52 -52 -52 -52 -52
38 0 0 0 0 0 0 0
47 0 0 0 0 0 0 0
c_str address: 16 78 -100 -47 -55 1 0 0
Examining this, we can see that the second word contains the address that points to the allocated data for the string, the third word is garbage (a buffer for Short String Optimization), the fourth word is the size, and the fifth word is the capacity. But what about the first word? It appears to be an address, but what for? Shouldn't everything already be accounted for?
For the sake of completeness, the following output shows SSO (the string is set to "Short String"). Note that the first word still seems to represent a pointer:
0 36 -28 19 123 1 0 0
83 104 111 114 116 32 83 116
114 105 110 103 0 -52 -52 -52
12 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0
c_str address: 112 -9 79 -108 23 0 0 0
EDIT: Ok, so having done more testing, it appears that the size of std::string actually decreases down to 32 bytes when compiled for release, and the first word is no longer there. But I'm still really interested in knowing why that is the case, and what that extra pointer is used for in debug mode.
Update: As per the tip by the user Yuushi, the extra word appears to related to Debug Iterator Support. This was verified when I turned off Debug Iterator Support (an example for doing this is shown here) and the size of std::string was reduced to 32 bytes, with the first word now missing.
However, it would still be really interesting to see how Debug Iterator Support uses that additional pointer to check for incorrect iterator use.
Visual Studio 2015 use xstring instead of string to define std::basic_string
NOTE: This answer is applied for VS2015 only, VS2013 uses a different implementation, however, they are more or less the same.
It's implemented as:
template<class _Elem,
class _Traits,
class _Alloc>
class basic_string
: public _String_alloc<_String_base_types<_Elem, _Alloc> >
{
// This class has no member data
}
_String_alloc use a _Compressed_pair<_Alty, _String_val<_Val_types> > to store its data, in std::string, _Alty is std::allocator<char> and _Val_types is _Simple_types<char>, because std::is_empty<std::allocator<char>>::value is true, sizeof _Compressed_pair<_Alty, _String_val<_Val_types> > is the same with sizeof _String_val<_Val_types>
class _String_val inherites from _Container_base which is a typedef of _Container_base0 when #if _ITERATOR_DEBUG_LEVEL == 0 and _Container_base12 otherwise. The difference between them is _Container_base12 contains pointer to _Container_proxy for debug purpose. Beside that _String_val also have those members:
union _Bxty
{ // storage for small buffer or pointer to larger one
_Bxty()
{ // user-provided, for fancy pointers
}
~_Bxty() _NOEXCEPT
{ // user-provided, for fancy pointers
}
value_type _Buf[_BUF_SIZE];
pointer _Ptr;
char _Alias[_BUF_SIZE]; // to permit aliasing
} _Bx;
size_type _Mysize; // current length of string
size_type _Myres; // current storage reserved for string
With _BUF_SIZE is 16.
And pointer_type, size_type is well aligned together in this system. No alignment is necessary.
Hence, when _ITERATOR_DEBUG_LEVEL == 0 then sizeof std::string is:
_BUF_SIZE + 2 * sizeof size_type
otherwise it's
sizeof pointer_type + _BUF_SIZE + 2 * sizeof size_type

UDP receive data as unsigned char

I am a trying to receive some data from network using UDP and parse it.
Here is the code,
char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
for(int i=0;i<n;i++)
cout << hex <<static_cast<short int>(recvline[i])<<" ";
Printed the output,
19 ffb0 0 0 ff88 d 38 19 48 38 0 0 2 1 3 1 ff8f ff82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
But I am expecting the output like,
19 b0 0 0 88 d 38 19 48 38 0 0 2 1 3 1 8f 82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
The ff shouldn't be there on printed output.
Actually I have to parse this data based on each character,
Like,
parseCommand(recvline);
and the parse code looks,
void parseCommand( char *msg){
int commId=*(msg+1);
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
}
And while debugging I am getting commId=-80 on watch.
Note:
In Linux I am getting successful output with the code, note that I have used unsigned char instead char for the read buffer.
unsigned char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
Where as in Windows recvfrom() not allowing the second argument as unsigned it giving build error, so I chose char
Looks like you might be getting the correct values, but your cast to short int during printing sign-extends your char value, causing ff to be propogated to the top byte if the top bit of your char is 1 (i.e. it is negative). You should first cast it to unsigned type, then extend to int, so you need 2 casts:
cout << hex << static_cast<short int>(static_cast<uint8_t>(recvline[i]))<<" ";
I have tested this and it behaves as expected.
In response to your extension: the data read is fine, it is a matter of how you interpret it. To parse correctly you should:
uint8_t commId= static_cast<uint8_t>(*(msg+1));
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
As you store your data in a signed data type conversions/promotion to bigger data types will first sign extend the value (filling the high order bits with the value of the MSB) even if it then gets converted to unsigned datatypes.
One solution is to define recvline as uint8_t[] in the first place an cast it to char* when passing it to the recvfrom function. That way, you only have to cast it once and you are using the same code in your windows and linux version. Also uint8_t[] is (at least to me) a clear indication that you are using the array as raw memory instead of a string of some kind.
Another possibility is to simply perform a bitwise And: (recvline[i] & 0xff). Thanks to automatic integral promotion this doesn't even require a cast.
Personal Note:
It is really annoying that the C and C++ standards don't provide a separate type for raw memory (yet), but with any luck well get a byte type in a future standard revision.

boost serialization hexadecimal decimal encoding of data

I am new to boost serialization but this seems very strange to me.
I have a very simple class with two members
int number // always = 123
char buffer[?] // buffer with ? size
so sometimes I set the size to buffer[31] then I serialize the class
22 serialization::archive 8 0 0 1 1 0 0 0 0 123 0 0 31 0 0 0 65 65
we can see the 123 and the 31 so no issue here both are in decimal format.
now I change buffer to buffer[1024] so I expected to see
22 serialization::archive 8 0 0 1 1 0 0 0 0 123 0 0 1024 0 0 0 65 65 ---
this is the actual outcome
22 serialization::archive 8 0 0 1 1 0 0 0 0 123 0 0 0 4 0 0 65 65 65
boost has switched to hex for the buffer size only?
notice the other value is still decimal.
So what happens if I switch number from 123 to 1024 ?
I would imagine 040 ?
22 serialization::archive 8 0 0 1 1 0 0 0 0 1024 0 0 0 4 0 0 65 65
If this is by design, why does the 31 not get converted to 1F ? its not consistent.
This causes problems in our load function for the split_free, we were doing this
unsigned int size;
ar >> size;
but as you might guess, when this is 040, it truncs to zero :(
what is the recommended solution to this?
I was using boost 1.45.0 but I tested this on boost 1_56.0 and it is the same.
EDIT: sample of the serialization function
template<class Archive>
void save(Archive& ar, const MYCLASS& buffer, unsigned int /*version*/) {
ar << boost::serialization::make_array(reinterpret_cast<const unsigned char*>(buffer.begin()), buffer.length());
}
MYCLASS is just a wrapper on a char* with the first element an unsigned int
to keep the length approximating a UNICODE_STRING
http://msdn.microsoft.com/en-gb/library/windows/desktop/aa380518(v=vs.85).aspx
The code is the same if the length is 1024 or 31 so I would not have expected this to be a problem.
I don't think Boost "switched to hex". I honestly don't have any experience with this, but it looks like boost is serializing as an array of bytes, which can only hold numbers from 0 through 255. 1024 would be a byte with a value 4 followed by a byte with the value 0.
"why does the 31 not get converted to 1F ? its not consistent" - your assumptions are creating false inconsistencies. Stop assuming you can read the serialization archive format when actually you're just guessing.
If you want to know, trace the code. If not, just use the archive format.
If you want "human accessible form", consider the xml_oarchive.

How to convert a HexBytes Array to a String in C/C++ on Arduino?

I realized converting a String into a hexarray now need to convert the new array into a new string,because the function Sha256.write needs a char, which would be the way?
char hexstring[] = "020000009ecb752aac3e6d2101163b36f3e6bd67d0c95be402918f2f00000000000000001036e4ee6f31bc9a690053320286d84fbfe4e5ee0594b4ab72339366b3ff1734270061536c89001900000000";
int i;
int n;
uint8_t bytearray[80];
Serial.println("Starting...");
char tmp[3];
tmp[2] = '\0';
int j = 0;
//GET ARRAY
for(i=0;i<strlen(hexstring);i+=2) {
tmp[0] = hexstring[i];
tmp[1] = hexstring[i+1];
bytearray[j] = strtol(tmp,0,16);
j+=1;
}
for(i=0;i<80;i+=1) {
Serial.println( bytearray[i]);
}
int _batchSize;
unsigned char hash[32];
SHA256_CTX ctx;
int idx;
Serial.println("SHA256...");
for(_batchSize = 100000; _batchSize > 0; _batchSize--){
bytearray[76] = nonce;
// Sha256.write(bytearray);
sha256_init(&ctx);
sha256_update(&ctx,bytearray,80);
sha256_final(&ctx,hash); //
sha256_init(&ctx);
sha256_update(&ctx,hash,32);
sha256_final(&ctx,hash); //are this corrent? i'm need in bytes too
// print_hash(hash);
int zeroBytes = 0;
for (int i = 31; i >= 28; i--, zeroBytes++)
if(hash[i] > 0)
break;
if(zeroBytes == 4){ // SOLUTION TRUE, NOW I'M NEED THIS IN STRING
printf("0x");
for (n = 0; n < 32; n++)
Serial.println(printf("%02x", hash[n])); //ERROR :(
}
//increase
if(++nonce == 4294967295)
nonce = 0;
}
}
}
output array on Serial port:
2
0
0
0
158
203
117
42
172
62
109
33
1
22
59
54
243
230
189
103
208
201
91
228
2
145
143
47
0
0
0
0
0
0
0
0
16
54
228
238
111
49
188
154
105
0
83
50
2
134
216
79
191
228
229
238
5
148
180
171
114
51
147
102
179
255
23
52
39
0
97
83
108
137
0
25
0
0
0
0
how to convert this to a hexstring char back?
UPDATED
this solutions works for me, thanks all!
void printHash(uint8_t* hash) {
int id;
for (id=0; id<32; id++) {
Serial.print("0123456789abcdef"[hash[id]>>4]);
Serial.print("0123456789abcdef"[hash[id]&0xf]);
}
Serial.println();
}
Skip to the section Addressing your code... at bottom for most relevant content
(this stuff up here is barely useful blither)
The purpose of your function:
Sha256.write((char *)bytearray);
I believe is to write more data to the running hash. (from this)
Therefore, I am not sure in the context of your question how to convert this to a hex-string char back? how this relates to the way you are using it.
Let me offer another approach for the sake of illustrating how you might go about returning the array of ints back into the form of a "hexadecimal string":
From Here
Here is a code fragment that will calculate the digest for the string "abc"
SHA256_CTX ctx;
u_int8_t results[SHA256_DIGEST_LENGTH];
char *buf;
int n;
buf = "abc";
n = strlen(buf);
SHA256_Init(&ctx);
SHA256_Update(&ctx, (u_int8_t *)buf, n);
SHA256_Final(results, &ctx);
/* Print the digest as one long hex value */
printf("0x");
for (n = 0; n < SHA256_DIGEST_LENGTH; n++)
printf("%02x", results[n]);
putchar('\n');
resulting in:
"0xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad".
In this example The array I believe you want, is contained in u_int8_t results
There is not enough description in your post to be sure this will help, let me know in the comments, and I will try to address further questions.
Added after your edit:
Continuing from the example above, to put the array contents of results back into a string, you can do something like this:
char *newString;
newString = malloc(sizeof(char)*SHA256_DIGEST_LENGTH*2);
memset(newString, 0, sizeof(char)*SHA256_DIGEST_LENGTH*2);
strcat(newString, "0x");
for(i=0;i<SHA256_DIGEST_LENGTH;i++)
{
sprintf(newString, "%s%02x\n", newString, results[i]);
}
//use newString for stuff...
free(newString);
Addressing your code, and your question directly:
Your code block:
for(_batchSize = 100000; _batchSize > 0; _batchSize--){
bytearray[76] = _batchSize;
Sha256.write((char *)bytearray); //here are the error
}
is not necessary if all you want to do is to convert an array of int into a "hexadecimal string"
Your int array, defined as:
int bytearray[80];
Already contains all the necessary values at this point, as you illustrated with your latest edit. If you want to return this data to a "hexadecimal string" form, then this will do that for you: (replacing result with your bytearray)
char *newString;
newString = malloc(sizeof(char)*SHA256_DIGEST_LENGTH*2);//if these defines are not in your environment
memset(newString, 0, sizeof(char)*SHA256_DIGEST_LENGTH*2);//then replace them with appropriate value for length
strcat(newString, "0x");
for(i=0;i<sizeof(bytearray)/sizeof(bytearray[0]);i++)
{
sprintf(newString, "%s%02x\n", newString, bytearray[i]);
}
//use newString for stuff...
free(newString);

C++, Writing vector<char> to ofstream skips whitespace

Despite my sincerest efforts, I cannot seem to locate the bug here. I am writing a vector to an ofstream. The vector contains binary data. However, for some reason, when a whitespace character (0x10, 0x11, 0x12, 0x13, 0x20) is supposed to be written, it is skipped.
I have tried using iterators, and a direct ofstream::write().
Here is the code I'm using. I've commented out some of the other methods I've tried.
void
write_file(const std::string& file,
std::vector<uint8_t>& v)
{
std::ofstream out(file, std::ios::binary | std::ios::ate);
if (!out.is_open())
throw file_error(file, "unable to open");
out.unsetf(std::ios::skipws);
/* ostreambuf_iterator ...
std::ostreambuf_iterator<char> out_i(out);
std::copy(v.begin(), v.end(), out_i);
*/
/* ostream_iterator ...
std::copy(v.begin(), v.end(), std::ostream_iterator<char>(out, ""));
*/
out.write((const char*) &v[0], v.size());
}
EDIT: And the code to read it back.
void
read_file(const std::string& file,
std::vector<uint8_t>& v)
{
std::ifstream in(file);
v.clear();
if (!in.is_open())
throw file_error(file, "unable to open");
in.unsetf(std::ios::skipws);
std::copy(std::istream_iterator<char>(in), std::istream_iterator<char>(),
std::back_inserter(v));
}
Here is an example input:
30 0 0 0 a 30 0 0 0 7a 70 30 0 0 0 32 73 30 0 0 0 2 71 30 0 0 4 d2
And this is the output I am getting when I read it back:
30 0 0 0 30 0 0 0 7a 70 30 0 0 0 32 73 30 0 0 0 2 71 30 0 0 4 d2
As you can see, 0x0a is being ommited, ostensibly because it's whitespace.
Any suggestions would be greatly appreciated.
You forgot to open the file in binary mode in the read_file function.
Rather than muck around with writing vector<>s directly, boost::serialization is a more effective way, using boost::archive::binary_oarchive.
I think 'a' is treated as new line. I still have to think how to get around this.
The istream_iterator by design skips whitespace. Try replacing your std::copy with this:
std::copy(
std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::back_inserter(v));
The istreambuf_iterator goes directly to the streambuf object, which will avoid the whitespace processing you're seeing.