How to zlib compress a QByteArray? - c++

I would like to maintain interoperability between every other application on the planet (including web applications) when compressing text. Since qCompress and qUncompress seem to go against the grain, I'm trying to use zlib directly from my Qt application.
I will accept the simplest (most minimal) answer that shows me how to use the zlib library with a QByteArray directly OR modify the output of qCompress so that it can be used outside of a Qt application.
Here's my embarrassing attempt:
QByteArray tdata = QString("Oh noes!").toUtf8();
QByteArray cdata;
uLongf len = 12 + 1.002*tdata.length();
compress(&cdata, &len, &tdata, tdata.length());
And the error:
error: cannot convert 'QByteArray*' to 'Bytef*' for argument '1' to 'int compress(Bytef*, uLongf*, const Bytef*, uLong)'
Then I tried using QByteArray::constData()
compress(cdata.constData(), &len, &tdata, tdata.length());
But got the following error:
error: invalid conversion from 'const char*' to 'Bytef*'
I have no idea what a Bytef is so I start looking in the zlib sources to investigate. But all I can find for this is in QtSources/src/3rdparty/zlib/zconf.h
# define Bytef z_Bytef
So now I'm just lost.

Based on this note in qUncompress, I think it's pretty easy.
Note: If you want to use this function to uncompress external data that was compressed using zlib, you first need to prepend a four byte header to the byte array containing the data. The header must contain the expected length (in bytes) of the uncompressed data, expressed as an unsigned, big-endian, 32-bit integer.
So you can probably just compress it like this:
QByteArray tdata = QString("Oh noes!").toUtf8();
QByteArray compressedData = qCompress(tdata);
compressedData.remove(0, 4);

Here is some code I once wrote which gets as input a pointer to a byte array, the number of bytes to compress and the compression level and then uses zlib to compress the input. The result is returned in a string.
enum compressionLevel
{
clFast,
clSmall,
clDefault
};
const size_t ChunkSize = 262144; //256k default size for chunks fed to zlib
void compressZlib(const char *s, size_t nbytes, std::string &out, compressionLevel l /*= clDefault*/ )
{
int level = Z_DEFAULT_COMPRESSION;
switch (l)
{
case clDefault:
level = Z_DEFAULT_COMPRESSION; break;
case clSmall:
level = Z_BEST_COMPRESSION; break;
case clFast:
level = Z_BEST_SPEED; break;
};
z_stream strm;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
int ret = deflateInit(&strm, level);
if (ret != Z_OK)
{
throw std::runtime_error("Error while initializing zlib, error code "+ret);
}
size_t toCompress = nbytes;
char *readp = (char*)s;
size_t writeOffset = out.size();
out.reserve((size_t)(nbytes * 0.7));
while (toCompress > 0)
{
size_t toRead = std::min(toCompress,ChunkSize);
int flush = toRead < toCompress ? Z_NO_FLUSH : Z_FINISH;
strm.avail_in = toRead;
strm.next_in = (Bytef*)readp;
char *writep = new char[ChunkSize];
do{
strm.avail_out = ChunkSize;
strm.next_out = (Bytef*)writep;
deflate(&strm, flush);
size_t written = ChunkSize - strm.avail_out;
out.resize(out.size() + written);
memcpy(&(out[writeOffset]), writep, written);
writeOffset += written;
} while (strm.avail_out == 0);
delete[] writep;
readp += toRead;
toCompress -= toRead;
}
(void)deflateEnd(&strm);
}
Maybe this helps you to solve your problem, I guess using the cdata.constData() you can directly call this function

Just to help you out with the last section of your question here:
I have no idea what a Bytef is so I start looking in the zlib sources to investigate.
For the definitions of Byte and Bytef, look at lines 332 and 333 of zconf.h, as well as line 342:
332 #if !defined(__MACTYPES__)
333 typedef unsigned char Byte; /* 8 bits */
...
338 #ifdef SMALL_MEDIUM
339 /* Borland C/C++ and some old MSC versions ignore FAR inside typedef */
340 # define Bytef Byte FAR
341 #else
342 typedef Byte FAR Bytef;
The definition of FAR is for mixed-mode MSDOS programming, otherwise it is not defined as anything (see lines 328-330 of zconf.h).
Thus the zlib typedefs Bytef and Byte are basically the same as unsigned char on most platforms. Therefore you should be able to-do the following:
QByteArray tdata = QString("Oh noes!").toUtf8();
QByteArray cdata(compressBound(tdata.length()), '\0');
uLongf len = compressBound(tdata.length());
compress(reinterpret_cast<unsigned char*>(cdata.data()), &len,
reinterpret_cast<unsigned char*>(tdata.data()), tdata.length());

Related

Alternate reading as char* and wchar_t*

I'm trying to write a program that parses ID3 tags, for educational purposes (so please explain in depth, as I'm trying to learn). So far I've had great success, but stuck on an encoding issue.
When reading the mp3 file, the default encoding for all text is ISO-8859-1. All header info (frame IDs etc) can be read in that encoding.
This is how I've done it:
ifstream mp3File("../myfile.mp3");
mp3File.read(mp3Header, 10); // char mp3Header[10];
// .... Parsing the header
// After reading the main header, we get into the individual frames.
// Read the first 10 bytes from buffer, get size and then read data
char encoding[1];
while(1){
char frameHeader[10] = {0};
mp3File.read(frameHeader, 10);
ID3Frame frame(frameHeader); // Parses frameHeader
if (frame.frameId[0] == 'T'){ // Text Information Frame
mp3File.read(encoding, 1); // Get encoding
if (encoding[0] == 1){
// We're dealing with UCS-2 encoded Unicode with BOM
char data[frame.size];
mp3File.read(data, frame.size);
}
}
}
This is bad code, because data is a char*, its' inside should look like this (converted undisplayable chars to int):
char = [0xFF, 0xFE, C, 0, r, 0, a, 0, z, 0, y, 0]
Two questions:
What are the first two bytes? - Answered.
How can I read wchar_t from my already open file? And then get back to reading the rest of it?
Edit Clarification: I'm not sure if this is the correct way to do it, but essentially what I wanted to do was.. Read the first 11 bytes to a char array (header+encoding), then the next 12 bytes to a wchar_t array (the name of the song), and then the next 10 bytes to a char array (the next header). Is that possible?
I figured out a decent solution: create a new wchar_t buffer and add the characters from the char array in pairs.
wchar_t* charToWChar(char* cArray, int len) {
char wideChar[2];
wchar_t wideCharW;
wchar_t *wArray = (wchar_t *) malloc(sizeof(wchar_t) * len / 2);
int counter = 0;
int endian = BIGENDIAN;
// Check endianness
if ((uint8_t) cArray[0] == 255 && (uint8_t) cArray[1] == 254)
endian = LITTLEENDIAN;
else if ((uint8_t) cArray[1] == 255 && (uint8_t) cArray[0] == 254)
endian = BIGENDIAN;
for (int j = 2; j < len; j+=2){
switch (endian){
case LITTLEENDIAN: {wideChar[0] = cArray[j]; wideChar[1] = cArray[j + 1];} break;
default:
case BIGENDIAN: {wideChar[1] = cArray[j]; wideChar[0] = cArray[j + 1];} break;
}
wideCharW = (uint16_t)((uint8_t)wideChar[1] << 8 | (uint8_t)wideChar[0]);
wArray[counter] = wideCharW;
counter++;
}
wArray[counter] = '\0';
return wArray;
}
Usage:
if (encoding[0] == 1){
// We're dealing with UCS-2 encoded Unicode with BOM
char data[frame.size];
mp3File.read(data, frame.size);
wcout << charToWChar(data, frame.size) << endl;
}

Need to convert 16 bit data to 8 bit

int main()
{
char str[200] = {0};
char out[500] = {0};
str[0]=0x00; str[1]=0x52; str[2]=0x00; str[3]=0x65; str[4]=0x00; str[5]=0x73; str[6]= 0x00; str[7]=0x74;
for(int i=0;i<sizeof(str);i++)
cout<<"-"<<str[i];
changeCharEncoding("UCS-2","ISO8859-1",str,out,sizeof(out));
cout<<"\noutput : "<<out;
for(int i=0;i<sizeof(out);i++)
cout<<":"<<out[i];
}
//encoding function
int changeCharEncoding(const char *from_charset, const char *to_charset, const char *input, char *output, int out_size)
{
size_t input_len = 8;
size_t output_len = out_size;
iconv_t l_cd;
if ((l_cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
{
return -1;
}
int rc = iconv(l_cd, (char **)&input, &input_len, (char **)&output, &output_len);
if (rc == -1)
{
iconv_close(l_cd);
return -2;
}
else
{
iconv_close(l_cd);
}
}
Please suggest me a method to convert 16 bit data to 8 bit.I have tried it using iconv. Also suggest me if there is something else to do the same.
It looks like you are trying to convert between UTF-16 and UTF-8 encoding:
Try changing your call of changeCharEncoding() to:
changeCharEncoding("UTF-16","UTF-8",str,out,sizeof(out));
The resulting UTF-8 output should be
刀攀猀琀
On a sidenote: there are several things in your code that you should consider improving. For example both changeCharEncoding and main are declared to return an int whereas your implementation does not.
Generally speaking - you cannot convert arbitrary 16 bit data into 8 bit data, you will loose some data
if you're trying to convert encodings - the same rule applies, as you cannot convert some symbols into 8bit ASCII, so they will be lost, for different platforms you can use different functions:
Windows: WideCharToMultiByte
*nix: iconv
I suspect you have an endian-ness problem: Try changing this
changeCharEncoding("UCS-2","ISO8859-1",str,out,sizeof(out));
to this
changeCharEncoding("UCS-2BE","ISO8859-1",str,out,sizeof(out));

QDataStream won’t work with custom crafted char array

I have an application which consists of two primary modules. One is written in C, uses standard C runtime library and one written in Qt C++. They communicate with each other with IPC. C module creates a char array, fills it with data and sends to the module written in Qt. I want to deserialize received data using QDataStream, but my efforts didn't yield any result yet. Here's a simple example what I'm trying to achieve:
unsigned int pointer = 0;
const int IPC_MSG_LEN = 500;
const int IPC_MSG_HEADER = 200;
const int SOMETHING = 1443;
char api = 55;
char msg[IPC_MSG_LEN] = {0};
memcpy_s(msg, IPC_MSG_LEN, &IPC_MSG_HEADER, sizeof(int));
pointer = sizeof(unsigned int);
memcpy_s(&msg[pointer], IPC_MSG_LEN - pointer, &api, sizeof(char));
++pointer;
memcpy_s(&msg[pointer], IPC_MSG_LEN - pointer, &SOMETHING, sizeof(int));
QByteArray arr(msg, IPC_MSG_LEN);
QDataStream ds(&arr, QIODevice::ReadOnly);
qint32 header = 0, aa = 0;
qint8 t_api = 0;
ds >> header; //Doesn't work
ds >> t_api; //Works
ds >> aa; //Doesn't work
As you can see, the code is pretty simple, but header and aa variables are deserialized to a random number. However t_api (one byte variable) has correct value assigned.
So what's the problem with this code? Does QDataStream uses a private data format which is not compatible with the one I'm using? Should I write my own QIODevice implementation or there is a quick fix I'm not aware of? :)
Thanks, I appreciate your help.
UPDATE
Thank you very much guys, your solution worked perfectly with those primitive data types, but the problem is that I also need to be able to serialize/deserialize char* strings too.
wchar_t* name1 = L"something";
memcpy_s(&msg[pointer], IPC_MSG_LEN - pointer, name1, (wcslen(name1) + 1) * 2);
char* ai = new char[500];
ds >> ai; //ai becomes NULL here :|
Is there a way to achieve that? Thanks again
QDataStream::setByteOrder(QDataStream::LittleEndian);
#include <QDebug>
#include <QByteArray>
#include <QDataStream>
#include <QString>
#include <vector>
template<typename T> void writePtr(char*&dst, T data){
*reinterpret_cast<T*>(dst) = data;
dst += sizeof(T);
}
int main(int argc, char** argv){
const size_t ipcSize = 512;
std::vector<char> buffer(ipcSize, 0);
quint32 sendVal1 = 0x12345678, recvVal1 = 0;
quint8 sendVal2 = 0xee, recvVal2 = 0;
quint32 sendVal3 = 0x9999abcd, recvVal3 = 0;
char* dst = &buffer[0];
writePtr(dst, sendVal1);
writePtr(dst, sendVal2);
writePtr(dst, sendVal3);
QByteArray byteArray(&buffer[0]);
QDataStream stream(&byteArray, QIODevice::ReadOnly);
stream.setByteOrder(QDataStream::LittleEndian);
stream >> recvVal1 >> recvVal2 >> recvVal3;
qDebug() << QString(QObject::tr("sent: %1, received: %2")).arg(sendVal1, 8, 16).arg (recvVal1, 8, 16);
qDebug() << QString(QObject::tr("sent: %1, received: %2")).arg(sendVal2, 2, 16).arg(recvVal2, 2, 16);
qDebug() << QString(QObject::tr("sent: %1, received: %2")).arg(sendVal3, 8, 16).arg(recvVal3, 8, 16);
return 0;
}
but the problem is that I also need to be able to serialize/deserialize char* strings too.
Qt data serialization format is explained (in detail) here. You MUST read that document if you want to use QDataStream for IPC. Qt has nice documentation, so use it.
Also this is not a char* string:
wchar_t* name1 = L"something";
It is a wchar_t* string.
wchar_t has different size depending on compiler - either 4 or 2 bytes per wchar_t. Which means problem for IPC. unlike wchar_t, char is guaranteed to be 1 byte big.
So either encode entire string to UTF8 (or use 8bit-encoded string with known codepage/encoding) and write it as raw data in QByteArray-compatible format:
void writeDataPtr(char*& ptr, const char* data, quint32 size){
if (!data){
size = 0xffffffff;
writePtr(ptr, size);
return;
}
memcpy(ptr, data, size);
ptr += size;
}
Then use QString::fromUtf8 to decode it (or QTextCodec - if you decided to use other 8bit encoding instead of utf8). OR if you can ensure that your wchar_t* string is UTF16-compliant and sizeof(wchar_t) == 2, dump it in QString-compatible format.
By the way - If I were you, I'd avoid memcpy_s. It is not part of C++ standard, which is a very good reason to avoid it.
I want is to read wchar_t*/char* from QDataStream until stream position gets to null terminating character.
If this is homework, tag your post accordingly.
One of those should work:
QString readWcharString(QDataStream& stream){
QVector<ushort> rawData;
ushort tmp;
do{
stream >> tmp;
rawData.push_back(tmp)
}while(tmp);
return QString::fromUtf16(rawData.data());
}
or
QString readWcharString(QDataStream& stream){
QVector<wchar_t> rawData;
ushort tmp;
do{
stream >> tmp;
rawData.push_back(tmp)
}while(tmp);
return QString::fromWCharArray(rawData.data());
}
QDataStream stores the numbers in the big endian format by default.
You can change that with:
ds.setByteOrder(QDataStream::ByteOrder(QSysInfo::ByteOrder));
which will use the detected host endianness instead.

How to parse data in C++

final byte LOGIN_REQUEST = 1;
long deviceId = 123456789;
String nickname = "testid";
Socket mSocket = new Socket("localhost", 12021);
ByteBuffer bBuffer = ByteBuffer.allocate(1);
bBuffer.order(ByteOrder.LITTLE_ENDIAN);
//1
bBuffer.put(LOGIN_REQUEST);
//8
bBuffer.putLong(deviceId);
byte[] bString = nickname.getBytes();
int sLength = bString.length;
//4
bBuffer.putInt(sLength);
bBuffer.put(bString);
I am sending byte data like this and I want to parse it on my linux server using c++
In c++, I am reading
char *pdata = new char[BUF_SIZE];
int dataLength = read(m_events[i].data.fd, pdata, BUF_SIZE);
and push the pdata into the pthread's queue. I think I have to read first byte to see the type of the packet and read the next 8byte to get the device id and so on..
Please give me some references or tutorial to do this in c++ code..
Thanks in advance..
The below code will effectively do the trick. I am assuming a java int is 32bit
#include <inttypes.h>
// Declare variables
unsigned char login_req;
int64_t device_id;
uint32_t name_len;
char* name_str;
// Populate variables;
login_req = pdata[0];
memcpy( &device_id, pdata+1, 8 );
memcpy( &name_len, pdata+9, 4 );
name_str = (char*)malloc( name_len + 1 );
memcpy( name_str, pdata+13, name_len );
name_str[name_len] = '\0';
Note: I am glossing over some stuff, namely
Does not handle when BUF_SIZE is to small
Does not handle when the C program machine is not little ENDIAN. If it is big endian then you would need to switch the bytes after the memcpy for device_id and name_len
Does not to type cast on memcpy calls to avoid possible compiler warnings
This solution is pure C, will work in C++ too
If you have control over the Java code, and there's not a standard network protocol you have to implement, then I would recommend a different approach. Instead of pushing bytes around at this low a level, you can use a higher level serialization library, such as Google Protocol Buffers. There are both C++ tutorials and Java tutorials, which should get you started.
Using the IOStream library is pretty much the same as you could have done using basic read and write without the intermediate buffer in C. i.e.
#include <inttypes.h>
#include <iostream>
void readData( std::istream input )
{
// Declare variables
unsigned char login_req;
int64_t device_id;
uint32_t name_len;
char* name_str;
// Populate variables;
input.read( &login_req, 1 );
input.read( &device_id, 8 );
input.read( &name_len, 4 );
name_str = (char*)malloc( name_len + 1 );
input.read( name_str, name_len );
name_str[name_len] = '\0';
}
Once again I am not error checking on the istream::read calls, worrying about endian issues, or putting in the type casts. Trying to keep it simple.

Serialization/Deserialization of a struct to a char* in C

I have a struct
struct Packet {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
char* Serialize() {
char *message = new char[MaxMailSize];
message[0] = senderId;
message[1] = sequenceNumber;
for (unsigned i=0;i<MaxDataSize;i++)
message[i+2] = data[i];
return message;
}
void Deserialize(char *message) {
senderId = message[0];
sequenceNumber = message[1];
for (unsigned i=0;i<MaxDataSize;i++)
data[i] = message[i+2];
}
};
I need to convert this to a char* , maximum length MaxMailSize > MaxDataSize for sending over network and then deserialize it at the other end
I can't use tpl or any other library.
Is there any way to make this better I am not that comfortable with this, or is this the best we can do.
since this is to be sent over a network, i strongly advise you to convert those data into network byte order before transmitting, and back into host byte order when receiving. this is because the byte ordering is not the same everywhere, and once your bytes are not in the right order, it may become very difficult to reverse them (depending on the programming language used on the receiving side). byte ordering functions are defined along with sockets, and are named htons(), htonl(), ntohs() and ntohl(). (in those name: h means 'host' or your computer, n means 'network', s means 'short' or 16bit value, l means 'long' or 32 bit value).
then you are on your own with serialization, C and C++ have no automatic way to perform it. some softwares can generate code to do it for you, like the ASN.1 implementation asn1c, but they are difficult to use because they involve much more than just copying data over the network.
Depending if you have enough place or not... you might simply use the streams :)
std::string Serialize() {
std::ostringstream out;
char version = '1';
out << version << senderId << '|' << sequenceNumber << '|' << data;
return out.str();
}
void Deserialize(const std::string& iString)
{
std::istringstream in(iString);
char version = 0, check1 = 0, check2 = 0;
in >> version;
switch(version)
{
case '1':
senderId >> check1 >> sequenceNumber >> check2 >> data;
break;
default:
// Handle
}
// You can check here than 'check1' and 'check2' both equal to '|'
}
I readily admit it takes more place... or that it might.
Actually, on a 32 bits architecture an int usually cover 4 bytes (4 char). Serializing them using streams only take more than 4 'char' if the value is superior to 9999, which usually gives some room.
Also note that you should probably include some guards in your stream, just to check when you get it back that it's alright.
Versioning is probably a good idea, it does not cost much and allows for unplanned later development.
You can have a class reprensenting the object you use in your software with all the niceties and member func and whatever you need. Then you have a 'serialized' struct that's more of a description of what will end up on the network.
To ensure the compiler will do whatever you tell him to do, you need to instruct it to 'pack' the structure. The directive I used here is for gcc, see your compiler doc if you're not using gcc.
Then the serialize and deserialize routine just convert between the two, ensuring byte order and details like that.
#include <arpa/inet.h> /* ntohl htonl */
#include <string.h> /* memcpy */
class Packet {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
public:
char* Serialize();
void Deserialize(char *message);
};
struct SerializedPacket {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
} __attribute__((packed));
void* Packet::Serialize() {
struct SerializedPacket *s = new SerializedPacket();
s->senderId = htonl(this->senderId);
s->sequenceNumber = htonl(this->sequenceNumber);
memcpy(s->data, this->data, MaxDataSize);
return s;
}
void Packet::Deserialize(void *message) {
struct SerializedPacket *s = (struct SerializedPacket*)message;
this->senderId = ntohl(s->senderId);
this->sequenceNumber = ntohl(s->sequenceNumber);
memcpy(this->data, s->data, MaxDataSize);
}
int senderId;
int sequenceNumber;
...
char *message = new char[MaxMailSize];
message[0] = senderId;
message[1] = sequenceNumber;
You're overwriting values here. senderId and sequenceNumber are both ints and will take up more than sizeof(char) bytes on most architectures. Try something more like this:
char * message = new char[MaxMailSize];
int offset = 0;
memcpy(message + offset, &senderId, sizeof(senderId));
offset += sizeof(senderId);
memcpy(message + offset, &sequenceNumber, sizeof(sequenceNumber));
offset += sizeof(sequenceNumber);
memcpy(message + offset, data, MaxDataSize);
EDIT:
fixed code written in a stupor. Also, as noted in comment, any such packet is not portable due to endian differences.
To answer your question generally, C++ has no reflection mechanism, and so manual serialize and unserialize functions defined on a per-class basis is the best you can do. That being said, the serialization function you wrote will mangle your data. Here is a correct implementation:
char * message = new char[MaxMailSize];
int net_senderId = htonl(senderId);
int net_sequenceNumber = htonl(sequenceNumber);
memcpy(message, &net_senderId, sizeof(net_senderId));
memcpy(message + sizeof(net_senderId), &net_sequenceNumber, sizeof(net_sequenceNumber));
As mentioned in other posts, senderId and sequenceNumber are both of type int, which is likely to be larger than char, so these values will be truncated.
If that's acceptable, then the code is OK. If not, then you need to split them into their constituent bytes. Given that the protocol you are using will specifiy the byte order of multi-byte fields, the most portable, and least ambiguous, way of doing this is through shifting.
For example, let's say that senderId and sequenceNumber are both 2 bytes long, and the protocol requires that the higher byte goes first:
char* Serialize() {
char *message = new char[MaxMailSize];
message[0] = senderId >> 8;
message[1] = senderId;
message[2] = sequenceNumber >> 8;
message[3] = sequenceNumber;
memcpy(&message[4], data, MaxDataSize);
return message;
}
I'd also recommend replacing the for loop with memcpy (if available), as it's unlikely to be less efficient, and it makes the code shorter.
Finally, this all assumes that char is one byte long. If it isn't, then all the data will need to be masked, e.g.:
message[0] = (senderId >> 8) & 0xFF;
You can use Protocol Buffers for defining and serializing of structs and classes. This is what google uses internally, and has a very small transfer mechanism.
http://code.google.com/apis/protocolbuffers/