Convert QByteArray to std::vector<unsigned char> - c++

I tried to convert QByteArray to std::vector<unsigned char> using this code:
unsigned char* buffer = (unsigned char*)byteArrayBuffer.constData();
std::vector<unsigned char>::size_type size = strlen((const char*)buffer);
std::vector<unsigned char> bufferToCompress(buffer, buffer + size);
but, assuming that byteArrayBuffer is a QByteArray filled with data, I think it doesn't work well on line unsigned char* buffer = (unsigned char*)byteArrayBuffer.constData(); because byteArrayBuffer.size() returns a different value than bufferToCompress.size().
How can I get it working?

I'm not familiar with Qt, but surely you just want
std::vector<unsigned char> bufferToCompress(
byteArrayBuffer.begin(), byteArrayBuffer.end());
Note: strlen is not particularly useful in C++; it tells you the length of a C-style null-terminated string (by searching memory until it either finds either finds a zero-valued byte, or falls off the end of accessible memory and crashes), but can't tell you the size of an array, which is what you'd need here. Also, using evil C-style casts to force invalid code to compile is never a good idea.

As I see here http://qt-project.org/doc/qt-4.8/qbytearray.html QByteArray does not have begin/end methods. But have data/length. Result code may be looks like this:
const unsigned char* begin = reinterpret_cast<unsigned char*>(byteArrayBuffer.data());
const unsigned char* end = begin + byteArrayBuffer.length();
std::vector<unsigned char> bufferToCompress( begin, end );

Related

What is the proper way to encode a std::wstring to UTF-16 in a std::vector<unsigned char>?

I am attempting to encode a std::wstring to UTF-16 and pass it to a function which takes a pair of vector iterators. To accomplish this, I have tried the following.
std::vector<unsigned char> HashAlgorithm::ComputeHash(std::wstring value)
{
std::wstring_convert<std::codecvt_utf16<wchar_t>> converter;
std::string encodedString = converter.to_bytes(value);
std::vector<unsigned char> encodedBytes(
reinterpret_cast<unsigned char const *>(encodedString.c_str()),
reinterpret_cast<unsigned char const *>(encodedString.c_str() + encodedString.size()));
std::vector<unsigned char> hashedBytes = this->ComputeHash(encodedBytes.begin(), encodedBytes.end());
return hashedBytes;
}
It works fine for the most part, except I know something is wrong because in debug mode I am seeing the following assertion on the return of hashedBytes, which smells like some kind of stack corruption.
What is causing this error and how can I prevent it?
EDIT #1
Here are the contents of support functions that I am using. I've been trying to break it down to figure out where the assertion is originating and why, but I've not been able to get a minimal reproduction yet.
std::vector<unsigned char> HashAlgorithm::ComputeHash(std::vector<unsigned char>::const_iterator begin, std::vector<unsigned char>::const_iterator end)
{
this->Process(begin, end);
std::vector<unsigned char> hashedBytes = this->Complete();
return hashedBytes;
}
void HashAlgorithm::Process(std::vector<unsigned char>::const_iterator begin, std::vector<unsigned char>::const_iterator end)
{
NTSTATUS status = BCryptHashData(this->hash, const_cast<unsigned char *>(&(*begin)), std::distance(begin, end), 0);
}
std::vector<unsigned char> HashAlgorithm::Complete()
{
std::vector<unsigned char> result(this->outputSize);
NTSTATUS status = BCryptFinishHash(this->hash, result.data(), (ULONG)result.size(), 0);
return result;
}
std::wstring between Microsoft VC++ 2010 and 2015 are not backward compatible.
The problem is that std::wstring in the library code (VS 2010) and the client code (VS 2015) differ in size by 4 bytes. The newer version of std::wstring is larger at 32 bytes while the older one is 28 bytes. When passing these variables by value around, stack corruption occurs in the first 4 bytes of the smaller std::wstring and triggers the stack canaries used to guard against stack based exploits.
To ensure you don't lose any data along the way you should hash the bytes directly:
std::vector<unsigned char> myClass::ComputeHash(std::wstring value)
{
auto size_of_data = value.size()*sizeof(value[0]);
auto pointer_to_data = reinterpret_cast<unsigned char const *>(value.data());
std::vector<unsigned char> encodedBytes(pointer_to_data,pointer_to_data+size_of_data);
std::vector<unsigned char> hashedBytes = this->ComputeHash(encodedBytes.begin(),encodedBytes.end());
return hashedBytes;
}
Try adding a banana (🍌 \U0001F34C) to see what happens to your data as you step through. e.g. std::wstring my_unicode_string{L"Test string 🍌\n"}; or std::wstring wstr = L"z\u00df\u6c34\U0001F34C"; // L"zß水🍌". The second example might be better if your .cpp file isn't saved as unicode text.
You will probably get an exception thrown by to_bytes because only codepoints in the basic multilingual plane can be encoded into a single wchar. And if it does do the conversion for you it might have mapped different higher codepoints to similar bytes which would lead to the same hash for different strings.

Passing string to function which accepts pointer to char

I've been working with OpenSSL library in C for a long time, but now I need to migrate to C++. OpenSSL's docs describe MD5 function like this.
unsigned char *MD5(const unsigned char *d, unsigned long n,
unsigned char *md);
I want to pass variable of type string to that function, but it accepts only char *.
Is it possible to pass string to parameter of type char * directly in C++? (I don't want to use extra manipulation with variable of type string)
You could use the c_str member function that std::string sports. Example
std::string data;
// load data somehow
unsigned char md[16] = { };
unsigned char *ret = MD5(reinterpret_cast<const unsigned char*>(data.c_str()),
data.size(),
md);
If you want to do away with the ugly cast operator, define a string class that holds unsigned chars instead of chars and use that.
typedef std::basic_string<unsigned char> ustring;
ustring data;
unsigned char *ret = MD5(data.c_str(), data.size(), md);
just a little note, which may save you a headache later on. MD5 takes an unsigned char pointer as a parameter. This is a clue that it's actually not a string, but a pointer to bytes.
In your program if you start storing byte vectors in a std::string, you're eventually going to initialise a string with a byte vector containing a zero, which opens the possibility of a bug that's difficult to detect down the line.
It is safer to store all your byte vectors in a std::vector<unsigned char> (or std::vector<uint8_t> because this forces safe initialisation.
std::vector<unsigned char> plaintext;
// initialise plaintext here
std::vector<unsigned char> my_hash(16);
MD5(plaintext.data(), plaintext.size(), &my_hash[0]);

Contents of an untyped object copied into vector<unsigned char>

I'm trying to write the contents of an untyped object that holds the bytes of an image into a vector filled with unsigned char. Sadly, i cannot get it to work. Maybe someone could point me in the right direction?
Here is what I have at the moment:
vector<unsigned char> SQLiteDB::BlobData(int clmNum){
//i get the data of the image
const void* data = sqlite3_column_blob(pSQLiteConn->pRes, clmNum);
vector<unsigned char> bytes;
//return the size of the image in bytes
int size = getBytes(clNum);
unsigned char b[size];
memcpy(b, data, size);
for(int j=0;j<size,j++){
bytes.push_back(b[j])M
}
return bytes;
}
If i try to trace the contents of the bytes vector it's all empty.
So the question is, how can i get the data into the vector?
You should use the vector's constructor that takes a couple of iterators:
const unsigned char* data = static_cast<const unsigned char*>(sqlite3_column_blob(pSQLiteConn->pRes, clmNum));
vector<unsigned char> bytes(data, data + getBytes(clNum));
Directly write into the vector, no need for additional useless copies:
bytes.resize(size);
memcpy(bytes.data(), data, size);
Instead of a copy, this has a zero-initialisation, so using the constructor like Maxim demonstrates or vector::insert is better.
const unsigned char* data = static_cast<const unsigned char*>(sqlite3_column_blob(pSQLiteConn->pRes, clmNum));
bytes.insert(data, data + getBytes(clNum));

const char * to vector<unsigned char> Initalisation

I understand that using vector is a good way to store binary data when using C++ and the STL. However for my unit tests I'd like to initalise the vector using a const char* C string variable.
I'm attempting to use a variant of the code found here - Converting (void*) to std::vector<unsigned char> - to do this:
const char* testdata = "the quick brown fox jumps over the lazy dog.";
unsigned char* buffer = (unsigned char*)testdata;
typedef vector<unsigned char> bufferType;
bufferType::size_type size = strlen((const char*)buffer);
bufferType vec(buffer, size);
However the VC++ compiler is not happy with the line initialising the vector, stating:
error C2664: 'std::vector<_Ty>::vector(unsigned int,const _Ty &)' : cannot convert parameter 1 from 'char *' to 'unsigned int'
I appreciate the extreme n00bity of this question and am fully prepared for much criticism on the code above :)
Thanks in advance,
Chris
It should be
bufferType vec(buffer, buffer + size);
not
bufferType vec(buffer, size);
std::transform is useful for just this sort of problem. You can use it to "transform" one piece of data at a time. See documentation here:
http://www.cplusplus.com/reference/algorithm/transform/
The following code works in VS2010. (I created a std::string from your const char* array, but you could probably avoid that if you really wanted to.)
#include <algorithm>
#include <vector>
int main(int, char*[])
{
// Initial test data
const char* testdata = "the quick brown fox jumps over the lazy dog.";
// Transform from 'const char*' to 'vector<unsigned char>'
std::string input(testdata);
std::vector<unsigned char> output(input.length());
std::transform(input.begin(), input.end(), output.begin(),
[](char c)
{
return static_cast<unsigned char>(c);
});
// Use the transformed data in 'output'...
return 0;
}
Here is what worked for me:
// Fetch data into vector
std::vector<char> buffer = <myMethod>.getdata();
// Get a char pointer to the data in the vector
char* buf = buffer.data();
// cast from char pointer to unsigned char pointer
unsigned char* membuf = reinterpret_cast<unsigned char*>(buf);
// now convert to vector<unsigned char> buffer
std::vector<unsigned char> vec(membuf, membuf + buffer.size());
// display vector<unsigned char>
CUtils::<myMethodToShowDataBlock>(vec);
What you intended to do seems to be something like:
buffertype vec(testdata, next(testdata, strlen(testdata)));
There is no need for the intermediate buffer variable. The conversion from char to unsigned char will happen implicitly.
Note that this does not grab the terminating '\0' character from testdata. So if you wanted to be able to do something like: cout << vec.data() you wouldn't be able to. If you want that you could do: buffertype vec(testdata, next(testdata, strlen(testdata) + 1)) or you may just want to consider doing:
basic_string<unsigned char> vec(testdata, next(testdata, strlen(testdata)));
Which will preserve a hidden '\0'. Because this is not a string you won't be able to do, cout << vec but cout << vec.data() will work. I've created a Live Example of each of these.

Casting an unsigned int + a string to an unsigned char vector

I'm working with the NetLink socket library ( https://sourceforge.net/apps/wordpress/netlinksockets/ ), and I want to send some binary data over the network in a format that I specify.
The format I have planned is pretty simple and is as follows:
Bytes 0 and 1: an opcode of the type uint16_t (i.e., an unsigned integer always 2 bytes long)
Bytes 2 onward: any other data necessary, such as a string, an integer, a combination of each, etc.. the other party will interpret this data according to the opcode. For example, if the opcode is 0 which represents "log in", this data will consist of one byte integer telling you how long the username is, followed by a string containing the username, followed by a string containing the password. For opcode 1, "send a chat message", the entire data here could be just a string for the chat message.
Here's what the library gives me to work with for sending data, though:
void send(const string& data);
void send(const char* data);
void rawSend(const vector<unsigned char>* data);
I'm assuming I want to use rawSend() for this.. but rawSend() takes unsigned chars, not a void* pointer to memory? Isn't there going to be some loss of data here if I try to cast certain types of data to an array of unsigned chars? Please correct me if I'm wrong.. but if I'm right, does this mean I should be looking at another library that has support for real binary data transfer?
Assuming this library does serve my purposes, how exactly would I cast and concatenate my various data types into one std::vector? What I've tried is something like this:
#define OPCODE_LOGINREQUEST 0
std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
loginRequestData->push_back(opcode);
// and at this point (not shown), I would push_back() the individual characters of the strings of the username and password.. after one byte worth of integer telling you how many characters long the username is (so you know when the username stops and the password begins)
socket->rawSend(loginRequestData);
Ran into some exceptions, though, on the other end when I tried to interpret the data. Am I approaching the casting all wrong? Am I going to lose data by casting to unsigned chars?
Thanks in advance.
I like how they make you create a vector (which must use the heap and thus execute in unpredictable time) instead of just falling back to the C standard (const void* buffer, size_t len) tuple, which is compatible with everything and can't be beat for performance. Oh, well.
You could try this:
void send_message(uint16_t opcode, const void* rawData, size_t rawDataSize)
{
vector<unsigned char> buffer;
buffer.reserve(sizeof(uint16_t) + rawDataSize);
#if BIG_ENDIAN_OPCODE
buffer.push_back(opcode >> 8);
buffer.push_back(opcode & 0xFF);
#elseif LITTLE_ENDIAN_OPCODE
buffer.push_back(opcode & 0xFF);
buffer.push_back(opcode >> 8);
#else
// Native order opcode
buffer.insert(buffer.end(), reinterpret_cast<const unsigned char*>(&opcode),
reinterpret_cast<const unsigned char*>(&opcode) + sizeof(uint16_t));
#endif
const unsigned char* base(reinterpret_cast<const unsigned char*>(rawData));
buffer.insert(buffer.end(), base, base + rawDataSize);
socket->rawSend(&buffer); // Why isn't this API using a reference?!
}
This uses insert which should optimize better than a hand-written loop with push_back(). It also won't leak the buffer if rawSend tosses an exception.
NOTE: Byte order must match for the platforms on both ends of this connection. If it does not, you'll need to either pick one byte order and stick with it (Internet standards usually do this, and you use the htonl and htons functions) or you need to detect byte order ("native" or "backwards" from the receiver's POV) and fix it if "backwards".
I would use something like this:
#define OPCODE_LOGINREQUEST 0
#define OPCODE_MESSAGE 1
void addRaw(std::vector<unsigned char> &v, const void *data, const size_t len)
{
const unsigned char *ptr = static_cast<const unsigned char*>(data);
v.insert(v.end(), ptr, ptr + len);
}
void addUint8(std::vector<unsigned char> &v, uint8_t val)
{
v.push_back(val);
}
void addUint16(std::vector<unsigned char> &v, uint16_t val)
{
val = htons(val);
addRaw(v, &val, sizeof(uint16_t));
}
void addStringLen(std::vector<unsigned char> &v, const std::string &val)
{
uint8_t len = std::min(val.length(), 255);
addUint8(v, len);
addRaw(v, val.c_str(), len);
}
void addStringRaw(std::vector<unsigned char> &v, const std::string &val)
{
addRaw(v, val.c_str(), val.length());
}
void sendLogin(const std::string &user, const std::string &pass)
{
std::vector<unsigned char> data(
sizeof(uint16_t) +
sizeof(uint8_t) + std::min(user.length(), 255) +
sizeof(uint8_t) + std::min(pass.length(), 255)
);
addUint16(data, OPCODE_LOGINREQUEST);
addStringLen(data, user);
addStringLen(data, pass);
socket->rawSend(&data);
}
void sendMsg(const std::string &msg)
{
std::vector<unsigned char> data(
sizeof(uint16_t) +
msg.length()
);
addUint16(data, OPCODE_MESSAGE);
addStringRaw(data, msg);
socket->rawSend(&data);
}
std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
loginRequestData->push_back(opcode);
If unsigned char is 8 bits long -which in most systems is-, you will be loosing the higher 8 bits from opcode every time you push. You should be getting a warning for this.
The decision for rawSend to take a vector is quite odd, a general library would work at a different level of abstraction. I can only guess that it is this way because rawSend makes a copy of the passed data, and guarantees its lifetime until the operation has completed. If not, then is just a poor design choice; add to that the fact that its taking the argument by pointer... You should see this data as a container of raw memory, there are some quirks to get right but here is how you would be expected to work with pod types in this scenario:
data->insert( data->end(), reinterpret_cast< char const* >( &opcode ), reinterpret_cast< char const* >( &opcode ) + sizeof( opcode ) );
This will work:
#define OPCODE_LOGINREQUEST 0
std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
unsigned char *opcode_data = (unsigned char *)&opcode;
for(int i = 0; i < sizeof(opcode); i++)
loginRequestData->push_back(opcode_data[i]);
socket->rawSend(loginRequestData);
This will also work for any POD type.
Yeah, go with rawSend since send probably expects a NULL terminator.
You don't lose anything by casting to char instead of void*. Memory is memory. Types are never stored in memory in C++ except for RTTI info. You can recover your data by casting to the type indicated by your opcode.
If you can decide the format of all your sends at compile time, I recommend using structs to represent them. I've done this before professionally, and this is simply the best way to clearly store the formats for a wide variety of messages. And it's super easy to unpack on the other side; just cast the raw buffer into the struct based on the opcode!
struct MessageType1 {
uint16_t opcode;
int myData1;
int myData2;
};
MessageType1 msg;
std::vector<char> vec;
char* end = (char*)&msg + sizeof(msg);
vec.insert( vec.end(), &msg, end );
send(vec);
The struct approach is the best, neatest way to send and receive, but the layout is fixed at compile time.
If the format of the messages is not decided until runtime, use a char array:
char buffer[2048];
*((uint16_t*)buffer) = opcode;
// now memcpy into it
// or placement-new to construct objects in the buffer memory
int usedBufferSpace = 24; //or whatever
std::vector<char> vec;
const char* end = buffer + usedBufferSpace;
vec.insert( vec.end(), buffer, end );
send(&buffer);