The difference between QDataStream and QByteArray - c++

QTemporaryFile tf;
tf.open ();
QDataStream tfbs (&tf);
tfbs << "hello\r\n" << "world!\r\n";
const int pos = int (tf.pos ());
QByteArray ba;
ba.append ("hello\r\n");
ba.append ("world!\r\n");
const int size = ba.size ();
Basically my question is, what am I doing wrong? Why is pos > size? Should I not be using << ? Should I not be using QDataStream?
Edit: Is there a way to configure QDataStream or QTemporaryFile so that the << operator doesn't prepend strings with 32bit lengths and store the null terminators in the file? Calling QDataStream::writeBytes when I just have a series of quoted strings and QStrings makes for very ugly code.

The answer is in the docs. I'm not going to go over QByteArray, as I believe it's fairly obvious that it is working as expected.
The QDataStream operator<<(char*) overload evaluates to the writeBytes() function.
This function outputs:
Writes the length specifier len and the buffer s to the stream and
returns a reference to the stream. The len is serialized as a quint32,
followed by len bytes from s. Note that the data is not encoded.
So for "hello\r\n", I would expect the output to be:
0,0,0,8,'h','e','l','l','o','\r','\n',0
The 4-byte length, followed by the bytes from the string. The string-ending NULL is probably also being added to the end, which would account for the otherwise mysterious extra two bytes.

So I ended up writing my own helper class to serialize my data:
class QBinaryStream
{
public:
QBinaryStream (QIODevice& iod) : m_iod (iod) {}
QBinaryStream& operator << (const char* data)
{
m_iod.write (data);
return *this;
}
QBinaryStream& operator << (const QString& data)
{
return operator << (data.toUtf8 ());
}
QBinaryStream& operator << (const QByteArray& data)
{
m_iod.write (data);
return *this;
}
private:
QIODevice& m_iod;
};

Should I not be using QDataStream?
In your case maybe QTextStream or even QString would do.
The QTextStream class provides a convenient interface for reading and
writing text.
QTextStream can operate on a QIODevice, a QByteArray or a QString.
Using QTextStream's streaming operators, you can conveniently read and
write words, lines and numbers.
As for QByteArray, Qstring should be preferred to it whenever possible
The QByteArray class provides an array of bytes.
QByteArray can be used to store both raw bytes (including '\0's) and
traditional 8-bit '\0'-terminated strings. Using QByteArray is much
more convenient than using const char *. Behind the scenes, it always
ensures that the data is followed by a '\0' terminator, and uses
implicit sharing (copy-on-write) to reduce memory usage and avoid
needless copying of data.
In addition to QByteArray, Qt also provides the QString class to store
string data. For most purposes, QString is the class you want to use.
It stores 16-bit Unicode characters, making it easy to store
non-ASCII/non-Latin-1 characters in your application. Furthermore,
QString is used throughout in the Qt API. The two main cases where
QByteArray is appropriate are when you need to store raw binary data,
and when memory conservation is critical (e.g., with Qt for Embedded
Linux).

Related

Qt QDataStream: operator>> for quint16 - I didn't get it at all

I have such code:
QByteArray portnoStr = "41034";
quint16 portno;
QDataStream stream(&portnoStr, QIODevice::ReadOnly);
stream >> portno;
std::cout << "portno: " << portno << "\n";
And as completely unexpected it print
portno: 13361
I look at the code of Qt (4x + 5x):
inline QDataStream &QDataStream::operator>>(quint16 &i)
{ return *this >> reinterpret_cast<qint16&>(i); }
At now I understand why it give me such result,
but I can not understand why QDataStream has such strange implementation?
QDataStream is not meant for converting data from one type to another in order to display text. From the docs:
You can also use a data stream to read/write raw unencoded binary data. If you want a "parsing" input stream, see QTextStream.
The QDataStream class implements the serialization of C++'s basic data types, like char, short, int, char *, etc. Serialization of more complex data is accomplished by breaking up the data into primitive units.
A data stream cooperates closely with a QIODevice. A QIODevice represents an input/output medium one can read data from and write data to. The QFile class is an example of an I/O device.
You're using cout to print encoded binary data, which is interpreted as an integer. That data is meant for reading and writing to IO devices, not printing.
Regarding reinterpret_cast to a qint16: since QDataStream simply writes raw binary data, pretending an unsigned int is signed has no effect on the output to the data stream. This is just a cheap way of reusing code: the bits are ultimately written as bits, regardless of type. It's up to you to cast them back to the appropriate data type (quint16) when reading back out from the data stream.

How to parse a sequence of integers stored in a text buffer?

Parsing text consisting of a sequence of integers from a stream in C++ is easy enough: just decode them. When the data is received somehow and is readily available within a program, e.g., receiving a base64 encoded text (the decoding isn't the problem), the situation is a bit different. The data is sitting in a buffer within the program and only needs to be decoded, not read. Of course, a std::istringstream could be used:
std::vector<int> parse_text(char* begin, char* end) {
std::istringstream in(std::string(begin, end));
return std::vector<int>(std::istream_iterator<int>(in),
std::istream_iterator<int>());
}
Since a lot of these buffers are received and they can be fairly big, it is desirable to not copy the actual content of character array and, ideally, to also avoid creating a stream for each buffer. Thus, the question becomes:
Given a buffer of chars containing a sequences of (space separated; dealing with other separators is easily done, e.g., using a suitable manipulator) integers how can they be decoded without copying the sequence and, if possible, without creating even an std::istream?
Avoiding a copy of the buffer is easily done with a custom stream buffer which simply sets of the get area to use the buffer. The stream buffer actually doesn't even need to override any of the virtual functions and would just set up the internal buffer:
class imemstream
: private virtual std::streambuf
, public std::istream
{
public:
imemstream(char* begin, char* end)
: std::streambuf()
, std::istream(static_cast<std::streambuf*>(this))
{
this->setg(begin, begin, end);
}
};
std::vector<int> parse_data_via_istream(char* begin, char* end)
{
imemstream in(begin, end);
return std::vector<int>(std::istream_iterator<int>(in),
std::istream_iterator<int>());
}
This approach avoids copying the stream and uses the ready made std::istream functionality. However, it does create a stream object. With a suitable update function the stream stream/stream buffer can be extended to reset the buffer and process multiple buffers.
To avoid creation of the stream, the underlying functionality from std::num_get<...> could be used. The actual parsing is done by one of the std::locale facets. The numeric parsing for std::istream is done by std::num_get<char, std::istreambuf_iterator<char>>. This facet isn't much help as it uses a sequence specified by std::istreambuf_iterator<char>s but a std::num_get<char, char const*> facet can be instantiated. It won't be in part of the default std::locale but it easy to create a corresponding std::locale and install it, e.g., as the global std::locale object first thing in main():
int main()
{
std::locale::global(std::locale(std::locale(),
new std::num_get<char, char const*>()));
...
Note that the std::locale object will clean-up the added facet, i.e., there is no need to add any clean-up code: the facets are reference counted and released when the last std::locale holding a particular facet disappears. To actually use the facet it, unfortunately, needs an std::ios_base object which is can only really be obtained from some stream object. However, any stream can be used (although in a multi-threaded system it should probably be a separate stream object per stream to avoid accidental race conditions):
char const* skipspace(char const* it, char const* end)
{
return std::find_if(it, end,
[](unsigned char c){ return !std::isspace(c); });
}
std::vector<int> parse_data_via_istream(std::ios_base& fmt,
char const* it, char const* end)
{
std::vector<int> rc;
std::num_get<char, char const*> const& ng
= std::use_facet<std::num_get<char, char const*>>(std::locale());
std::ios_base::iostate error;
for (long tmp;
(it = ng.get(skipspace(it, end), end, fmt, error, tmp))
, error == std::ios_base::goodbit; ) {
rc.push_back(tmp);
}
return rc;
}
Most of this just about a bit of error handling and skipping leading whitespace: mostly, std::istream provides facilities to automatically skip whitespace for formatted input and deals with the necessary error protocol. There is potentially a small advantage of the approach outlined above with respect to getting the facet just once per buffer and avoiding creation of a std::istream::sentry object as well as avoiding creation of a stream. Of course, the code assumes that some stream can be used to pass it in as its std::ios_base& subobject to provide parsing flags like the base to be used.
OK, this is quite a bit of code for something which strtol() could mostly do, too. The approach using std::num_get<char, char const*> has some flexibility which isn't offered by strtol():
Since the std::locale's facet are used which can be overridden to parse arbitrary formats of representation, e.g., Roman numerals, it more flexible with respect to input formats.
It is easy to set up use of thousands separators or change the representation of the decimal point (just change std::numpunct<char> in std::locale used by fmt to set these up).
The buffer doesn't have to be null-terminated. For example, a contiguous sequence of character made up of 8 digit values can be parsed by feeding it and it+8 as the range when calling std::num_get<char, char const*>::get().
However, strtol() is probably a good approach for most uses. On the other hand, the above provides an alternative which may be useful in some contexts.

Storing integer to QByteArray using only 4 bytes

It takes 4 bytes to represent an integer. How can I store an int in a QByteArray so that it only takes 4 bytes?
QByteArray::number(..) converts the integer to string thus taking up more than 4 bytes.
QByteArray((const char*)&myInteger,sizeof(int)) also doesn't seem to work.
There are several ways to place an integer into a QByteArray, but the following is usually the cleanest:
QByteArray byteArray;
QDataStream stream(&byteArray, QIODevice::WriteOnly);
stream << myInteger;
This has the advantage of allowing you to write several integers (or other data types) to the byte array fairly conveniently. It also allows you to set the endianness of the data using QDataStream::setByteOrder.
Update
While the solution above will work, the method used by QDataStream to store integers can change in future versions of Qt. The simplest way to ensure that it always works is to explicitly set the version of the data format used by QDataStream:
QDataStream stream(&byteArray, QIODevice::WriteOnly);
stream.setVersion(QDataStream::Qt_5_10); // Or use earlier version
Alternately, you can avoid using QDataStream altogether and use a QBuffer:
#include <QBuffer>
#include <QByteArray>
#include <QtEndian>
...
QByteArray byteArray;
QBuffer buffer(&byteArray);
buffer.open(QIODevice::WriteOnly);
myInteger = qToBigEndian(myInteger); // Or qToLittleEndian, if necessary.
buffer.write((char*)&myInteger, sizeof(qint32));
#Primož Kralj did not get around to posting a solution with his second method, so here it is:
int myInt = 0xdeadbeef;
QByteArray qba(reinterpret_cast<const char *>(&myInt), sizeof(int));
qDebug("QByteArray has bytes %s", qPrintable(qba.toHex(' ')));
prints:
QByteArray has bytes ef be ad de
on an x64 machine.
Recently I faced the same problem with a little variation. I had to store a vector of unsigned short into QByteArray. The trick with QDataStream did not work for unknown reason. So, my solution is:
QVector<uint16_t> d={1,2,3,4,5};
QByteArray dd((char*)d.data(),d.size()*sizeof(uint16_t));
The way to get the vector back is:
QVector<uint16_t> D;
for(int i=0; i<dd.size()/sizeof(uint16_t); ++i){
D.push_back(*(uint16_t*)(dd.data()+i*sizeof(uint16_t)) );
}

Creating an input stream from constant memory

I have some data in a buffer pointed to by a const char* pointer. The data is just an ASCII string. I know its size. I would like to be able to read it in the same way data is read from streams. I'm looking for a solution that would allow me to write code like this:
// for example, data points to a string "42 3.14 blah"
MemoryStreamWrapper in(data, data_size);
int x;
float y;
std::string w;
in >> x >> y >> w;
Important condition: the data must not be copied or altered in any way (otherwise I'd just use a string stream. To my best knowledge, it isn't possible to create a string stream from a const char pointer without copying the data.)
The way to do this is to create a suitable stream buffer. This can, e.g., be done like this:
#include <streambuf>
#include <istream>
struct membuf: std::streambuf {
membuf(char const* base, size_t size) {
char* p(const_cast<char*>(base));
this->setg(p, p, p + size);
}
};
struct imemstream: virtual membuf, std::istream {
imemstream(char const* base, size_t size)
: membuf(base, size)
, std::istream(static_cast<std::streambuf*>(this)) {
}
};
The only somewhat awkward thing is the const_cast<char*>() in the stream buffer: the stream buffer won't change the data but the interface still requires char* to be used, mainly to make it easier to change the buffer in "normal" stream buffers. With this, you can use imemstream as a normal input stream:
imemstream in(data, size);
in >> value;
The only way would be to subclass std::istream (which also requires subclassing std::streambuf) to create your own stream class that reads from constant memory.
It's not as easy as it sounds because the the C++ standard library stream classes are pretty messy and badly designed. I don't think it's worth it unless you need it to scale a lot.

encrypting and serializing stl string and other containers

I have data in stl containers (vector). Each node in the vector is a structure which also contains stl strings.
struct record
{
string name;
string location;
int salary;
}
vector< record > employees;
I want to serialize employees but I also want to encrypt it before serializing.
my encryption function looks like this:
Encode(const char * inBfr, const int in_size, char ** outBfr, int& out_size )
By searching it looks like the stl standard doesn't require the memory of my structure to be contiguous so I can't just grab the memory of employees variable. Is there any other smart way that I can use this encoding function with my stl based structures/container? It is good for me that Encode function works in plain char * buffers so I know exactly what goes in and out but stl structures are not and I am tring to find a nice way so I can use stl with this function.
I am also opening to using any other stl containers if that helps.
Although the element in the std::vector<T> are guaranteed to be laid out contiguously, this doesn't really help: the record you have may include padding and, more importantly, will store the std::string's content external to the std::string object (in case the small string optimization is used, the value may be embedded inside the std::string but it will also contain a couple of bytes which are not part of the std::strings value). Thus, you best option is to format your record and encrypt the formatted string.
The formatting is straight forward but personally I would encapsulate the encoding function into a simple std::streambuf so that the encryption can be done by a filtering stream buffer. Given the signature you gave, this could look something like this:
class encryptbuf
: public std::streambuf {
std::streambuf* d_sbuf;
char d_buffer[1024];
public:
encryptbuf(std::streambuf* sbuf)
: d_sbuf(sbuf) {
this->setp(this->d_buffer, this->d_buffer + sizeof(this->d_buffer) - 1);
}
int overflow(int c) {
if (c != std::char_traits<char>::eof()) {
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}
return this->pubsync()? std::char_traits<char>::eof(): std::char_traits<char>::not_eof(c);
}
int sync() {
char* out(0);
int size(0);
Encode(this->pbase(), this->pptr() - this->pbase(), &out, size);
this->d_sbuf->sputn(out, size);
delete[] out; // dunno: it seems the output buffer is allocated but how?
this->setp(this->pbase(), this->epptr());
return this->d_sbuf->pubsync();
}
};
int main() {
encryptbuf sbuf(std::cout.rdbuf());
std::ostream eout(&sbuf);
eout << "print something encoded to standard output\n" << std::flush;
}
Now, creating an output operator for your records just printing to an std::ostream can be used to create an encoded
It's probably easiest to serialize your structure into a string, then encrypt the string. For example:
std::ostringstream buffer;
buffer << a_record.name << "\n" << a_record.location << "\n" << a_record.salary;
encode(buffer.str().c_str(), buffer.str().length(), /* ... */);
If it were me, I'd probably write encode (or at least a wrapper for it) to take input (and probably produce output) in a vector, string, or stream though.
If you want to get ambitious, there are other possibilities. First of all, #MooingDuck raises a good point that it's often worthwhile to overload operator<< for the class, instead of working with the individual items all the time. This will typically be a small function similar to what's above:
std::ostream &operator<<(std::ostream &os, record const &r) {
return os << r.name << "\n" << r.location << "\n" << r.salary;
}
Using this, you'd just have:
std::ostringstream os;
os << a_record;
encode(os.str().c_str(), os.str().length(), /* ... */);
Second, if you want to get really ambitious, you can put the encryption into (for one example) a codecvt facet, so you can automatically encrypt all the data as you write it to a stream, and decrypt it as you read it back in. Another possibility is to build the encryption into a filtering streambuf object instead. The codecvt facet is probably the method that should theoretically be preferred, but the streambuf is almost certainly easier to implement, with less unrelated "stuff" involved.