Casting an unsigned int + a string to an unsigned char vector - c++

I'm working with the NetLink socket library ( https://sourceforge.net/apps/wordpress/netlinksockets/ ), and I want to send some binary data over the network in a format that I specify.
The format I have planned is pretty simple and is as follows:
Bytes 0 and 1: an opcode of the type uint16_t (i.e., an unsigned integer always 2 bytes long)
Bytes 2 onward: any other data necessary, such as a string, an integer, a combination of each, etc.. the other party will interpret this data according to the opcode. For example, if the opcode is 0 which represents "log in", this data will consist of one byte integer telling you how long the username is, followed by a string containing the username, followed by a string containing the password. For opcode 1, "send a chat message", the entire data here could be just a string for the chat message.
Here's what the library gives me to work with for sending data, though:
void send(const string& data);
void send(const char* data);
void rawSend(const vector<unsigned char>* data);
I'm assuming I want to use rawSend() for this.. but rawSend() takes unsigned chars, not a void* pointer to memory? Isn't there going to be some loss of data here if I try to cast certain types of data to an array of unsigned chars? Please correct me if I'm wrong.. but if I'm right, does this mean I should be looking at another library that has support for real binary data transfer?
Assuming this library does serve my purposes, how exactly would I cast and concatenate my various data types into one std::vector? What I've tried is something like this:
#define OPCODE_LOGINREQUEST 0
std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
loginRequestData->push_back(opcode);
// and at this point (not shown), I would push_back() the individual characters of the strings of the username and password.. after one byte worth of integer telling you how many characters long the username is (so you know when the username stops and the password begins)
socket->rawSend(loginRequestData);
Ran into some exceptions, though, on the other end when I tried to interpret the data. Am I approaching the casting all wrong? Am I going to lose data by casting to unsigned chars?
Thanks in advance.

I like how they make you create a vector (which must use the heap and thus execute in unpredictable time) instead of just falling back to the C standard (const void* buffer, size_t len) tuple, which is compatible with everything and can't be beat for performance. Oh, well.
You could try this:
void send_message(uint16_t opcode, const void* rawData, size_t rawDataSize)
{
vector<unsigned char> buffer;
buffer.reserve(sizeof(uint16_t) + rawDataSize);
#if BIG_ENDIAN_OPCODE
buffer.push_back(opcode >> 8);
buffer.push_back(opcode & 0xFF);
#elseif LITTLE_ENDIAN_OPCODE
buffer.push_back(opcode & 0xFF);
buffer.push_back(opcode >> 8);
#else
// Native order opcode
buffer.insert(buffer.end(), reinterpret_cast<const unsigned char*>(&opcode),
reinterpret_cast<const unsigned char*>(&opcode) + sizeof(uint16_t));
#endif
const unsigned char* base(reinterpret_cast<const unsigned char*>(rawData));
buffer.insert(buffer.end(), base, base + rawDataSize);
socket->rawSend(&buffer); // Why isn't this API using a reference?!
}
This uses insert which should optimize better than a hand-written loop with push_back(). It also won't leak the buffer if rawSend tosses an exception.
NOTE: Byte order must match for the platforms on both ends of this connection. If it does not, you'll need to either pick one byte order and stick with it (Internet standards usually do this, and you use the htonl and htons functions) or you need to detect byte order ("native" or "backwards" from the receiver's POV) and fix it if "backwards".

I would use something like this:
#define OPCODE_LOGINREQUEST 0
#define OPCODE_MESSAGE 1
void addRaw(std::vector<unsigned char> &v, const void *data, const size_t len)
{
const unsigned char *ptr = static_cast<const unsigned char*>(data);
v.insert(v.end(), ptr, ptr + len);
}
void addUint8(std::vector<unsigned char> &v, uint8_t val)
{
v.push_back(val);
}
void addUint16(std::vector<unsigned char> &v, uint16_t val)
{
val = htons(val);
addRaw(v, &val, sizeof(uint16_t));
}
void addStringLen(std::vector<unsigned char> &v, const std::string &val)
{
uint8_t len = std::min(val.length(), 255);
addUint8(v, len);
addRaw(v, val.c_str(), len);
}
void addStringRaw(std::vector<unsigned char> &v, const std::string &val)
{
addRaw(v, val.c_str(), val.length());
}
void sendLogin(const std::string &user, const std::string &pass)
{
std::vector<unsigned char> data(
sizeof(uint16_t) +
sizeof(uint8_t) + std::min(user.length(), 255) +
sizeof(uint8_t) + std::min(pass.length(), 255)
);
addUint16(data, OPCODE_LOGINREQUEST);
addStringLen(data, user);
addStringLen(data, pass);
socket->rawSend(&data);
}
void sendMsg(const std::string &msg)
{
std::vector<unsigned char> data(
sizeof(uint16_t) +
msg.length()
);
addUint16(data, OPCODE_MESSAGE);
addStringRaw(data, msg);
socket->rawSend(&data);
}

std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
loginRequestData->push_back(opcode);
If unsigned char is 8 bits long -which in most systems is-, you will be loosing the higher 8 bits from opcode every time you push. You should be getting a warning for this.
The decision for rawSend to take a vector is quite odd, a general library would work at a different level of abstraction. I can only guess that it is this way because rawSend makes a copy of the passed data, and guarantees its lifetime until the operation has completed. If not, then is just a poor design choice; add to that the fact that its taking the argument by pointer... You should see this data as a container of raw memory, there are some quirks to get right but here is how you would be expected to work with pod types in this scenario:
data->insert( data->end(), reinterpret_cast< char const* >( &opcode ), reinterpret_cast< char const* >( &opcode ) + sizeof( opcode ) );

This will work:
#define OPCODE_LOGINREQUEST 0
std::vector<unsigned char>* loginRequestData = new std::vector<unsigned char>();
uint16_t opcode = OPCODE_LOGINREQUEST;
unsigned char *opcode_data = (unsigned char *)&opcode;
for(int i = 0; i < sizeof(opcode); i++)
loginRequestData->push_back(opcode_data[i]);
socket->rawSend(loginRequestData);
This will also work for any POD type.

Yeah, go with rawSend since send probably expects a NULL terminator.
You don't lose anything by casting to char instead of void*. Memory is memory. Types are never stored in memory in C++ except for RTTI info. You can recover your data by casting to the type indicated by your opcode.
If you can decide the format of all your sends at compile time, I recommend using structs to represent them. I've done this before professionally, and this is simply the best way to clearly store the formats for a wide variety of messages. And it's super easy to unpack on the other side; just cast the raw buffer into the struct based on the opcode!
struct MessageType1 {
uint16_t opcode;
int myData1;
int myData2;
};
MessageType1 msg;
std::vector<char> vec;
char* end = (char*)&msg + sizeof(msg);
vec.insert( vec.end(), &msg, end );
send(vec);
The struct approach is the best, neatest way to send and receive, but the layout is fixed at compile time.
If the format of the messages is not decided until runtime, use a char array:
char buffer[2048];
*((uint16_t*)buffer) = opcode;
// now memcpy into it
// or placement-new to construct objects in the buffer memory
int usedBufferSpace = 24; //or whatever
std::vector<char> vec;
const char* end = buffer + usedBufferSpace;
vec.insert( vec.end(), buffer, end );
send(&buffer);

Related

Convert Char array to uint8_t vector

For some project i need to send encoded messages but i can only give vetor of uint8_t to be sent, and i have a char array (with numbers and string i converted to hexadecimal in it) and a pointer on the array. I encode the msg which is an object into the buffer then i have to send it and decode it etc.
char buffer[1024]
char *p = buffer
size_t bufferSize = sizeof(buffer)
Encode(msg,p,bufferSize)
std::vector<uint8_t> encodedmsg; //here i need to put my message in the buffer
Send(encodedmsg.data(),encodedmsg.size()) //Only taking uint8_t vector
Here is the prototype of send :
uint32_t Send(const uint8_t * buffer, const std::size_t bufferSize)
I already looked at some questions but no one have to replace it in a vector or convert to uint8_t.
I thinked bout memcpy or reinterpreted cast or maybe using a for loop but i don't really know how to do it whitout any loss.
Thanks,
Actually your code suggest that Send() function takes pointer to uint8_t, not std::vector<uint8_t>.
And since char and uint8_t has same memory size you just could do:
Send(reinterpret_cast<uint8_t*>(p), bufferSize);
But if you want to do everything "right" you could do this:
encodedmsg.resize(bufferSize);
std::transform(p, p + bufferSize, encodedmsg.begin(), [](char v) {return static_cast<uint8_t>(v);});

Function that dynamically construct a byte array and return length

I need to create an encoder function in a class
bool encodeMsg(unsigned char* buffer, unsigned short& len);
This class has some fixed length members and some variable length vectors (of different structures).
I have to encode a Byte stream based on some sequence of these member variables.
Here is a salable version,
class test
{
public:
test();
~test();
bool encodeMsg(unsigned char* buffer);
bool decodeMsg(const unsigned char* buffer, unsigned short len);
private:
unsigned char a; // 0x12
unsigned char b; // 0x34
unsigned char c; // 0x56
}
what I want is 0x123456 in my buffer when I encode.
Questions,
How should I allocate memory? As It is not known before calling this function
Is there a way to map class object memory which basically gives what I want.
I know this is very basic question, but want to know optimal and conventional method to do it.
How should I allocate memory? As It is not known before calling this function
Given you current code, the caller should allocate the memory:
unsigned char buffer[3];
unsigned short len = sizeof buffer;
my_test_object.encodeMsg(buffer, len);
Is there a way to map class object memory which basically gives what I want.
That's very vague. If you use a (possibly compiler-specific) #pragma or attribute to ensure the character values occupy 3 contiguous bytes in memory, and as long as you don't add any virtual functions to the class, you can implement encodeMsg() using:
memcpy(buffer, (unsigned char*)this + offsetof(test, a), 3);
But, what's the point? At best, I can't imagine that memcpy ever being faster than the "nice" way to write it:
buffer[0] = a;
buffer[1] = b;
buffer[2] = c;
If you actually mean something more akin to:
test* p = reinterpret_cast<test*>(buffer);
*p = *this;
That will have undefined behaviour, and may write up to sizeof(test) bytes into the buffer, which is quite likely to be 4 rather than 3, and that could cause some client code buffer overruns, remove an already-set NUL terminator etc.. Hackish and dangerous.
Taking a step back, if you have to ask these sorts of questions you should be worrying about adopting good programming practice - only once you're a master of this kind of thing should you be worrying about what's optimal. For developing good habits, you might want to look at the boost serialisation library and get comfortable with it first.
If you can change the interface of your encodeMsg() function you could store the byte stream in a vector.
bool test::encodeMsg(std::vector<unsigned char>& buffer)
{
// if speed is important you can fill the buffer some other way
buffer.push_back(a);
buffer.push_back(b);
buffer.push_back(c);
return true;
}
If encodeMsg() can't fail (does not need to return bool) you can create and return the vector in it like this:
std::vector<unsigned char> test::encodeMsg()
{
std::vector<unisgned char> buffer;
// if speed is important you can fill the buffer some other way
buffer.push_back(a);
buffer.push_back(b);
buffer.push_back(c);
return buffer;
}
The C++ way would be to use streams. Just implement the insertion operator << for encoding like this
std::ostream& operator<<(std::ostream& os, const test& t)
{
os << t.a;
os << t.b;
os << t.c;
return os;
}
Same with extraction operator >> for decoding
std::istream& operator>>(std::istream& is, test& t)
{
is >> t.a;
is >> t.b;
is >> t.c;
return is;
}
This moves memory management to the stream and caller. If you need a special encoding for the types then derive your codec from istream and ostream and use those.
The memory and the size can be retrieved from the stream when using a stringstream like this
test t;
std::ostringstream strm;
strm << t;
std::string result = strm.str();
auto size = result.length(); // size
auto array = result.data(); // the byte array
For classes that are trivially copyable std::is_trivially_copyable<test>::value == true, encoding and decoding is actually straight forward (assuming you have already allocated the memory for buffer:
bool encodeMsg(unsigned char* buffer, unsigned short& len) {
auto* ptr=reinterprete_cast<unsigned char*>(this);
len=sizeof(test);
memcpy(buffer, ptr, len);
return true;
}
bool decodeMsg(const unsigned char& buffer){
auto* ptr=reinterprete_cast<unsigned char*>(this);
memcpy(ptr, buffer, sizeof(test));
return true;
}
or shorter
bool encodeMsg(unsigned char* buffer, unsigned short& len) {
len=sizeof(test);
memcpy(buffer, (unsigned char*)this, len);
return true;
}
bool decodeMsg(const unsigned char& buffer){
memcpy((unsigned char*)this, buffer, sizeof(test));
return true;
}
Most probably, you will copy 4 bytes instead of 3 though due to stuffing.
As far as interpreting something directly as a byte array goes - casting a pointer from test* to unsigned char* and accessing the object through it is legal,but not the other way round. So what you could write is:
unsigned char* buffer encodeMsg( unsigned short& len) {
len=sizeof(test);
return reinterprete_cast<unsigned char*>(this);
}
bool decodeMsg(const unsigned char& buffer){
auto* ptr=reinterprete_cast<unsigned char*>(this);
memcpy(ptr, buffer, sizeof(test));
return true;
}

Convert char* to uint8_t

I transfer message trough a CAN protocol.
To do so, the CAN message needs data of uint8_t type. So I need to convert my char* to uint8_t. With my research on this site, I produce this code :
char* bufferSlidePressure = ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();//My char*
/* Conversion */
uint8_t slidePressure [8];
sscanf(bufferSlidePressure,"%c",
&slidePressure[0]);
As you may see, my char* must fit in sliderPressure[0].
My problem is that even if I have no error during compilation, the data in slidePressure are totally incorrect. Indeed, I test it with a char* = 0 and I 've got unknow characters ... So I think the problem must come from conversion.
My datas can be Bool, Uchar, Ushort and float.
Thanks for your help.
Is your string an integer? E.g. char* bufferSlidePressure = "123";?
If so, I would simply do:
uint8_t slidePressure = (uint8_t)atoi(bufferSlidePressure);
Or, if you need to put it in an array:
slidePressure[0] = (uint8_t)atoi(bufferSlidePressure);
Edit: Following your comment, if your data could be anything, I guess you would have to copy it into the buffer of the new data type. E.g. something like:
/* in case you'd expect a float*/
float slidePressure;
memcpy(&slidePressure, bufferSlidePressure, sizeof(float));
/* in case you'd expect a bool*/
bool isSlidePressure;
memcpy(&isSlidePressure, bufferSlidePressure, sizeof(bool));
/*same thing for uint8_t, etc */
/* in case you'd expect char buffer, just a byte to byte copy */
char * slidePressure = new char[ size ]; // or a stack buffer
memcpy(slidePressure, (const char*)bufferSlidePressure, size ); // no sizeof, since sizeof(char)=1
uint8_t is 8 bits of memory, and can store values from 0 to 255
char is probably 8 bits of memory
char * is probably 32 or 64 bits of memory containing the address of a different place in memory in which there is a char
First, make sure you don't try to put the memory address (the char *) into the uint8 - put what it points to in:
char from;
char * pfrom = &from;
uint8_t to;
to = *pfrom;
Then work out what you are really trying to do ... because this isn't quite making sense. For example, a float is probably 32 or 64 bits of memory. If you think there is a float somewhere in your char * data you have a lot of explaining to do before we can help :/
char * is a pointer, not a single character. It is possible that it points to the character you want.
uint8_t is unsigned but on most systems will be the same size as a char and you can simply cast the value.
You may need to manage the memory and lifetime of what your function returns. This could be done with vector< unsigned char> as the return type of your function rather than char *, especially if toUtf8() has to create the memory for the data.
Your question is totally ambiguous.
ui->canDataModifiableTableWidget->item(6,3)->text().toUtf8().data();
That is a lot of cascading calls. We have no idea what any of them do and whether they are yours or not. It looks dangerous.
More safe example in C++ way
char* bufferSlidePressure = "123";
std::string buffer(bufferSlidePressure);
std::stringstream stream;
stream << str;
int n = 0;
// convert to int
if (!(stream >> n)){
//could not convert
}
Also, if boost is availabe
int n = boost::lexical_cast<int>( str )

Nice representation of byte array and its size

How would you represent byte array and its size nicely? I'd like to store (in main memory or within a file) raw byte arrays(unsigned chars) in which first 2/4 bytes will represents its size. But operations on such array does not look well:
void func(unsigned char *bytearray)
{
int size;
memcpy(&size, bytearray, sizeof(int));
//rest of operation when we know bytearray size
}
How can I avoid that? I think about a simple structure:
struct bytearray
{
int size;
unsigned char *data;
};
bytearray *b = reinterpret_cast<bytearray*>(new unsigned char[10]);
b->data = reinterpret_cast<unsigned char*>(&(b->size) + 1);
And I've got an access to a size and data part of bytearray. But it still looks ugly. Could you recommend an another approach?
Unless you have some overwhelming reason to do otherwise, just do the idiomatic thing and use std::vector<unsigned char>.
You're effectively re-inventing the "Pascal string". However
b->data = reinterpret_cast<unsigned char*>(&(b->size) + 1);
won't work at all, because the pointer points to itself, and the pointer will get overwritten.
You should be able to use an array with unspecified size for the last element of a structure:
struct bytearray
{
int size;
unsigned char data[];
};
bytearray *b = reinterpret_cast<bytearray*>(::operator new(sizeof (bytearray) + 10));
b->size = 10;
//...
::operator delete(b);
Unlike std::vector, this actually stores the size and data together, so you can, for example, write it to a file in one operation. And memory locality is better.
Still, the fact that std::vector is already tested and many useful algorithms are implemented for you makes it very attractive.
I would use std::vector<unsigned char> to manage the memory, and write a conversion function to create some iovec like structure for you at the time that you need such a thing.
iovec make_iovec (std::vector<unsigned char> &v) {
iovec iv = { &v[0], v.size() };
return iv;
}
Using iovec, if you need to write both the length and data in a single system call, you can use the writev call to accomplish it.
ssize_t write_vector(int fd, std::vector<unsigned char> &v) {
uint32_t len = htonl(v.size());
iovec iv[2] = { { &len, sizeof(uint32_t) }, make_iovec(v) };
return writev(fd, iv, 2);
}

How to cast from unsigned long to void*?

I am trying to pwrite some data at some offset of a file with a given file descriptor. My data is stored in two vectors. One contains unsigned longs and the other chars.
I thought of building a void * that points to the bit sequence representing my unsigned longs and chars, and pass it to pwrite along with the accumulated size. But how can I cast an unsigned long to a void*? (I guess I can figure out for chars then). Here is what I'm trying to do:
void writeBlock(int fd, int blockSize, unsigned long offset){
void* buf = malloc(blockSize);
// here I should be trying to build buf out of vul and vc
// where vul and vc are my unsigned long and char vectors, respectively.
pwrite(fd, buf, blockSize, offset);
free(buf);
}
Also, if you think my above idea is not good, I'll be happy to read suggestions.
You cannot meaningfully cast an unsigned long to a void *. The former is a numeric value; the latter is the address of unspecified data. Most systems implement pointers as integers with a special type (that includes any system you're likely to encounter in day-to-day work) but the actual conversion between the types is considered harmful.
If what you want to do is write the value of an unsigned int to your file descriptor, you should take the address of the value by using the & operator:
unsigned int *addressOfMyIntegerValue = &myIntegerValue;
pwrite(fd, addressOfMyIntegerValue, sizeof(unsigned int), ...);
You can loop through your vector or array and write them one by one with that. Alternatively, it may be faster to write them en masse using std::vector's contiguous memory feature:
std::vector<unsigned int> myVector = ...;
unsigned int *allMyIntegers = &myVector[0];
pwrite(fd, allMyIntegers, sizeof(unsigned int) * myVector.size(), ...);
unsigned long i;
void* p = (void*)&i;
It can be cast using following code:
unsigned long my_long;
pwrite(fd, (void*)&my_long, ...);
Like this:
std::vector<unsigned long> v1;
std::vector<char> v2;
void * p1 = reinterpret_cast<void*>(&v1[0]);
void * p2 = reinterpret_cast<void*>(&v2[0]);
Write sizes v1.size() * sizeof(unsigned long) and v2.size().