Issues with QDataStream using QString and QByteArray - c++

QByteArray ba;
QDataStream ds(&ba,QIODevice::WriteOnly);
ds<<quint8(1)<<quint16(2)<<quint32(3); //1+2+4
qDebug()<<"size:"<<ba.size(); // 7
I use QDataStream to write 3 number, ba.size() is 7, but I'm confused about this:
QByteArray ba;
QDataStream ds(&ba,QIODevice::WriteOnly);
QString s="a";
ds<<quint8(1)<<quint16(2)<<quint32(3)<<s; //1+2+4+a
qDebug()<<"size:"<<ba.size(); // 13
If a QString's size is 1, ba's size plus 6, why is that? sizeof(QString) is 4.

Let's analyze the difference between both impressions:
"\x01\x00\x02\x00\x00\x00\x03"
"\x01\x00\x02\x00\x00\x00\x03\x00\x00\x00\x02\x00""a"
-----------------------------------------------------
x00\x00\x00\x02\x00""a
And for that, let's review the source code:
QDataStream &operator<<(QDataStream &out, const QString &str)
{
if (out.version() == 1) {
out << str.toLatin1();
} else {
if (!str.isNull() || out.version() < 3) {
if ((out.byteOrder() == QDataStream::BigEndian) == (QSysInfo::ByteOrder == QSysInfo::BigEndian)) {
out.writeBytes(reinterpret_cast<const char *>(str.unicode()), sizeof(QChar) * str.length());
} else {
QVarLengthArray<ushort> buffer(str.length());
const ushort *data = reinterpret_cast<const ushort *>(str.constData());
for (int i = 0; i < str.length(); i++) {
buffer[i] = qbswap(*data);
++data;
}
out.writeBytes(reinterpret_cast<const char *>(buffer.data()), sizeof(ushort) * buffer.size());
}
} else {
// write null marker
out << (quint32)0xffffffff;
}
}
return out;
}
That method uses the writeBytes() method,
and according to the docs:
QDataStream &QDataStream::writeBytes(const char *s, uint len)
Writes the length specifier len and the buffer s to the stream and
returns a reference to the stream.
The len is serialized as a quint32, followed by len bytes from s. Note
that the data is not encoded.
That is, apart from writing the data, write the length of the text in quint32 format (4 bytes) and the length of the buffer is equal to sizeOf(QChar) x length of the QString.
Taking into account in it we can understand the result better:
x00\x00\x00\x02 \x00""a
--------------- -------
numbers of bytes of buffer buffer
In general you can use the following formula to calculate the size of the stored data:
length stored data = 4 + 2 x length of string

By checking Qt Documentation for QDatastream, how strings are stored and retreived:
a char * string is written as a 32-bit integer equal to the length of
the string including the '\0' byte, followed by all the characters of
the string including the '\0' byte. When reading a char * string, 4
bytes are read to create the 32-bit length value, then that many
characters for the char * string including the '\0' terminator are
read.
So in your case 32 bit for length of the string + 1 Byte for "a" + 1 byte for \0, which sums to 6 bytes.

Related

How do I write a char[] to a binary file in C++?

I try to write a char[] to a binary file, this is my method, for an int, a float, a double, and a char*
friend ofstream& operator<<(ofstream& outB, Test& t) {
//scriem int, float si double normal
outB.write((char*)&t.i, sizeof(int));
outB.write((char*)&t.f, sizeof(float));
outB.write((char*)&t.d, sizeof(double));
//scriem stringul ca un char ( +1 pentru \0)
outB.write(t.s.c_str(), t.s.size() + 1);
//scriem charul
int nrCHAR = strlen(t.c) + 1;
outB.write((char*)&nrCHAR, sizeof(int));
outB.write(t.c, strlen(t.c) + 1);
return outB;
}
I can do it like this?
outB.write(c, strlen(c) + 1);
c being my char[]
Usually, text fields are variable length, so you will have to write the length and the text. Here's my suggestion:
const uint32_t length = strlen(c);
outB.write((char *) &length, sizeof(length));
outB.write(c, length);
When reading, you can do the reverse:
const uint32_t length = 0;
inpB.read((char *) &length, sizeof(length));
char * text = new char [length + 1];
inpB.read(&text[0], length);
text[length] = '\0';
A nice feature of writing the length first, is that you can allocate dynamic memory for the string before reading the text. Also, since the length of the text is known, a block read can be performed.
When using a sentinel character, like '\0', you have to read character by character until the sentinel is read. This is slow, really slow. You may also have to reallocate your array, since you don't know the size of the string.

Print the results of MD5 function

I want to print MD5 for some string. For this I have done the the function
std::string generateHashMD5(std::string text)
{
unsigned char * resultHash;
resultHash = MD5((const unsigned char*)text.c_str(), text.size(), NULL);
std::string result;
result += (char *) resultHash;
return result;
}
Mow I want to print the result of this function. I try to version of such function.
void printHash(std::string hash)
{
for (unsigned i = 0; i < str.size(); i++)
{
int val = (short) hash[i];
std::cout<<std::hex<<val<<':';
}
std::cout<<std::endl;
}
std::string printHash(std::string hash)
{
char arrayResult[200];
for(int i = 0; i < 16; i++)
sprintf(&arrayResult[i*2], "%02x", (unsigned short int)hash[i]);
std::string result;
result += arrayResult;
return result;
}
The problem is that unfortunately none of it does not show correct result. What should be changed in this function or where is the mistakes?
You improperly use std::string as a buffer:
result += (char *) resultHash;
treats resultHash as a c-string, so if there is \0 byte in middle it would not get enough data. If there is no \0 byte you would copy too much and get UB. You should use constructor with size:
std::string result( static_cast<const char *>( resultHash ), blocksize );
where block size probably is 16. But I would recommend to use std::array<uint8_t,blocksize> or std::vector<uint8_t> instead os std::string, as std::string for buffer is very confusing.
in case if MD5 returns byte array
result += (char *) resultHash;
return result;
conversion to string will lose numbers after 0 because string constructor interprets input as null-terminated string
so vector can be used or string construction with explicit number of characters.
Still, there are not enough information to say exactly

Alternate way to get the Byte Length of a Hex string

I have created a function to count the byte length of an incoming hex string, then convert that length into hexidecimal. It first assigns the Byte Length of the incoming string to an int, then I convert the int to a string. After assigning the byte length of my incoming string to an int, I check to see if it is more than 255, if it is, I insert a zero so that I have 2 bytes returned, instead of 3-bits.
I do the follwing:
1) Takes in the Hex string and divides the number by 2.
static int ByteLen(std::string sHexStr)
{
return (sHexStr.length() / 2);
}
2) Takes in Hex string, then converts to a Hex format string with itoa()
static std::string ByteLenStr(std::string sHexStr)
{
//Assign the length to an int
int iLen = ByteLen(sHexStr);
std::string sTemp = "";
std::string sZero = "0";
std::string sLen = "";
char buffer [1000];
if (iLen > 255)
{
//returns the number passed converted to hex base-16
//if it is over 255 then it will insert a 0 infront
//so to have 2 bytes instead of 3-bits
sTemp = itoa (iLen,buffer,16);
sLen = sTemp.insert(0,sZero);
return sLen;
}
else{
return itoa (iLen,buffer,16);
}
}
I convert the length to hexidecimal. This seems to work fine, however I am looking for maybe a more simpler way to format the text like I would in C# with the ToString("X2") method. Is this it for C++ or does my method work well enough?
Here is how I would do it in C#:
public static int ByteLen(string sHexStr)
{
return (sHexStr.Length / 2);
}
public static string ByteLenStr(string sHexStr)
{
int iLen = ByteLen(sHexStr);
if (iLen > 255)
return iLen.ToString("X4");
else
return iLen.ToString("X2");
}
My logic may be off a bit in C++, but the C# method is good enough for me in what I want to do.
Thank you for your time.
static std::string ByteLenStr(std::string& sHexStr)
{
int iLen = ByteLen(sHexStr);
char buffer[16];
snprintf(buffer, sizeof(buffer), (iLen > 255) ? "%04x" : "%02x", iLen);
return buffer;
}
snprintf formats text in a buffer using a format string and a variable list of arguments. We are using the %x format code to convert a int argument into a hex string. In this instance, we have two format strings to choose from:
When iLen > 255, we want the number to be four digits long. %04x means format as a hex string, with zero-padding at the beginning up to four places.
Otherwise, we want the number to be two digits long. %02x means format as a hex string, with zero-padding up to two places.
We use the ternary operator to select which format string we use. Finally, iLen is passed as the single argument which will be used to provide the value that is formatted by the function.
For a purely C++ solutuon that does not use any C functions, try using a std::stringstream to help you with formatting:
static std::string ByteLenStr(std::string sHexStr)
{
//Assign the length to an int
int iLen = ByteLen(sHexStr);
//return the number converted to hex base-16
//if it is over 255 then insert a 0 in front
//so to have 2 bytes instead of 3-bits
std::stringstream ss;
ss.fill('0');
ss.width((iLen > 255) ? 4 : 2);
ss << std::right << std::hex << iLen;
return ss.str();
}

Read blocks of a binary file buffer into different types

I am trying to read a binary file into memory, and then use it like so:
struct myStruct {
std::string mystring; // is 40 bytes long
uint myint1; // is 4 bytes long
};
typedef unsigned char byte;
byte *filedata = ReadFile(filename); // reads file into memory, closes the file
myStruct aStruct;
aStruct.mystring = filedata.????
I need a way of accessing the binary file with an offset, and getting a certain length at that offset.
This is easy if I store the binary file data in a std::string, but i figured that using that to store binary data is not as good way of doing things. (filedata.substr(offset, len))
Reasonably extensive (IMO) searching hasn't turned anything relevant up, any ideas? I am willing to change storage type (e.g. to std::vector) if you think it is necessary.
If you're not going to use a serialization library, then I suggesting adding serialization support to each class:
struct My_Struct
{
std::string my_string;
unsigned int my_int;
void Load_From_Buffer(unsigned char const *& p_buffer)
{
my_string = std::string(p_buffer);
p_buffer += my_string.length() + 1; // +1 to account for the terminating nul character.
my_int = *((unsigned int *) p_buffer);
p_buffer += sizeof(my_int);
}
};
unsigned char * const buffer = ReadFile(filename);
unsigned char * p_buffer = buffer;
My_Struct my_variable;
my_variable.Load_From_Buffer(p_buffer);
Some other useful interface methods:
unsigned int Size_On_Stream(void) const; // Returns the size the object would occupy in the stream.
void Store_To_Buffer(unsigned char *& p_buffer); // Stores object to buffer, increments pointer.
With templates you can extend the serialization functionality:
void Load_From_Buffer(std::string& s, unsigned char *& p_buffer)
{
s = std::string((char *)p_buffer);
p_buffer += s.length() + 1;
}
void template<classtype T> Load_From_Buffer(T& object, unsigned char *& p_buffer)
{
object.Load_From_Buffer(p_buffer);
}
Edit 1: Reason not to write structure directly
In C and C++, the size of a structure may not be equal to the sum of the size of its members.
Compilers are allowed to insert padding, or unused space, between members so that the members are aligned on an address.
For example, a 32-bit processor likes to fetch things on 4 byte boundaries. Having one char in a structure followed by an int would make the int on relative address 1, which is not a multiple of 4. The compiler would pad the structure so that the int lines up on relative address 4.
Structures may contain pointers or items that contain pointers.
For example, the std::string type may have a size of 40, although the string may contain 3 characters or 300. It has a pointer to the actual data.
Endianess.
With multibyte integers some processors like the Most Significant Byte (MSB), a.k.a. Big Endian, first (the way humans read numbers) or the Least Significant Byte first, a.k.a. Little Endian. The Little Endian format takes less circuitry to read than the Big Endian.
Edit 2: Variant records
When outputting things like arrays and containers, you must decide whether you want to output the full container (include unused slots) or output only the items in the container. Outputting only the items in the container would use a variant record technique.
Two techniques for outputting variant records: quantity followed by items or items followed by a sentinel. The latter is how C-style strings are written, with the sentinel being a nul character.
The other technique is to output the quantity of items, followed by the items. So if I had 6 numbers, 0, 1, 2, 3, 4, 5, the output would be:
6 // The number of items
0
1
2
3
4
5
In the above Load_From_Buffer method, I would create a temporary to hold the quantity, write that out, then follow with each item from the container.
You could overload the std::ostream output operator and std::istream input operator for your structure, something like this:
struct Record {
std::string name;
int value;
};
std::istream& operator>>(std::istream& in, Record& record) {
char name[40] = { 0 };
int32_t value(0);
in.read(name, 40);
in.read(reinterpret_cast<char*>(&value), 4);
record.name.assign(name, 40);
record.value = value;
return in;
}
std::ostream& operator<<(std::ostream& out, const Record& record) {
std::string name(record.name);
name.resize(40, '\0');
out.write(name.c_str(), 40);
out.write(reinterpret_cast<const char*>(&record.value), 4);
return out;
}
int main(int argc, char **argv) {
const char* filename("records");
Record r[] = {{"zero", 0 }, {"one", 1 }, {"two", 2}};
int n(sizeof(r)/sizeof(r[0]));
std::ofstream out(filename, std::ios::binary);
for (int i = 0; i < n; ++i) {
out << r[i];
}
out.close();
std::ifstream in(filename, std::ios::binary);
std::vector<Record> rIn;
Record record;
while (in >> record) {
rIn.push_back(record);
}
for (std::vector<Record>::iterator i = rIn.begin(); i != rIn.end(); ++i){
std::cout << "name: " << i->name << ", value: " << i->value
<< std::endl;
}
return 0;
}

Get string from starting index to end from char Array

I have an object with a char Array; where the first 5 bytes(char in C++) are additional data and everything afterwards is a string message.
So my question is how can I get a string from starting index 5 way up to the last byte?
I know there is memccpy, but it requires an ending char, which I can't know beforehand.
I am aware there is a string object in C++, but the idea is to send back and forth a byte array which contains the data and message. So in a sense I serialize and deserialize back and forth.
Any suggestions?
Edit:
Packet * Packet::create(byte const data[])
{
//Concat all first 4 byte values to a uint32
unsigned int length = data[0] << 32 | data[1] << 16 | data[2] << 8 | data[3] << 0;
//4th element is packet type
PacketType type = (PacketType)data[4];
string packetData;
packetData.clear();
char * cdata;
//Check packet data is present
if(sizeof(data) > 5)
{
//string s((char)data);
//packetData = s.substr(4, s.length() - 4);
strncat(cdata,data+5,sizeof(data)-5);
packetData.append(cdata);
}
//Create new packet;
Packet * packet = new Packet(length,type,packetData);
return packet;
};
It won't accept data[] even when I cast it to char.
The argument isn't a pointer?
Edit::
Packet * Packet::create(char const * data)
{
//Concat all first 4 byte values to a uint32
unsigned int length = data[0] << 32 | data[1] << 16 | data[2] << 8 | data[3] << 0;
//4th element is packet type
PacketType type = (PacketType)data[4];
//Set packet data, if available
string packetData = (sizeof(data) > 5) ? string(data+5):"";
Packet * packet = new Packet(length,type,packetData);
return packet;
};
I still have to test this, but I had to use char, how do I use my own typedef in this situation?
Also what is the difference between
"char * data"
and
"char data[]"
I thought arrays and pointers are one and the same thing.
You mentioned "know there is memccpy, but it requires an ending char, which I can't know beforehand". Does that means that your serialized data doesn't have neither the size of the data nor a delimiter? Without that how do you expect
"string packetData = (sizeof(data) > 5) ? string(data+5):"";"
to work?
For the serialization you could send the size of your data as well in the header. Then use the simple memcpy.
use strcpy with charArray + 5 as source parameter.
You should also know of strlen which gives you the string's length [might be needed to allocate the char[], if there is not known upper bound for it].
EDIT: code snap:
#include <iostream>
#include <cstring>
using namespace std;
int main() {
char in[] = "XXXXXqwerty";
//dynamic allocation using strlen() if you don't have upper bound for in
char* out = new char[strlen(in) - 4];
strcpy(out,in+5);
cout << out;
delete[] out;
return 0;
}