stream data into a byte array

stream data into a byte array - c++

I have to admit that I am a bit confused at the moment, so sorry if the question isnt quite clear or trivial (actually I hope it is the latter)....
I am sending an array of bytes across the network and would like to do something like this on the sender side:
size_t max_size = 100;
uint8_t buffer[size];
idontknowwhat_t x{buffer};
uint16_t size = 11; // total number of bytes in the buffer
uint16_t id_a,id_b,id_c; // some ids
uint8_t a,b,c; // some data
x << size << id_a << a << id_b << b << id_c << c;
someMethodToSend(buffer,size);
and on the receiver side something like this:
size_t max_size = 100;
uint8_t buffer[size];
someMethodToReceive(buffer);
idontknowwhat_t x{buffer};
uint16_t size;
x >> size;
for (uint16_t i=0; i<size-2; i++) {
uint16_t id;
uint8_t data;
x >> id >> data;
std::cout << id << " " << data;
}
So my aim is basically to avoid ugly casts and manually incrementing a pointer while being able to have uint8_t and uint16_t (and possibly also uint32_t) in the buffer. The data I put in the buffer here is just an example, and I am aware that I need to take care of the byte order when sending over the network (and it would be fine if I had to do this "manually").
Is there something that I can use in place of my hypothetical idontknowwhat_t ?

You cannot really avoid doing ugly casts, but at least you can hide them into the idontknowwhat_t class's operator>> and operator<< functions. And using templates, you could limit the number of casts in your code to the bare minimum.
class idontknowwhat_t
{
uint8_t* _data;
public:
idontknowwhat_t(uint8_t* buffer)
: _data(buffer)
{}
template<typename insert_type>
idontknowwhat_t& operator<<(insert_type value)
{
*reinterpret_cast<insert_type*>(_data) = value;
_data += sizeof(insert_type);
return *this;
}
template<typename extract_type>
idontknowwhat_t& operator>>(extract_type& value)
{
value = *reinterpret_cast<extract_type*>(_data);
_data += sizeof(extract_type);
return *this;
}
};
I think this will actually work directly with your code. In this example, the idontknowwhat_t class does not own the buffer and simply keeps a raw pointer to the next bit of data it expects to read or write. For real-life purposes I would recommend letting the idontknowwhat_t class manage the buffer memory.
In addition, none of the code on this page actually takes care of the data's endianness, which would definitely be the idontknowwhat_t class's responsibility. There is a boost library for that. I'm not documenting that library's use here, since I think it distracts from the questions real point.

Have you tried std::list? You could group the elements into types and put them into lists with the appropriate type. Then you could create an std::list of std::lists.

Related

Extending the functionality of std::vector<uint8_t> - namespace, composition or inheritance?

Very often I need to provide a uint8_t array to some third-party library. Usually the third-party library asks for a uint8_t*, together with a length argument. Generally I use a std::vector<uint8_t> and use its data() and size() methods to get this information which works a treat. Now I've often found myself wanting to create this vector<uint8_t> using the << operator, similar to how std::stringstream works, for example:
uint8_t first = 8;
uint8_t second = 3;
std::vector<uint8_t> raw;
raw << first
<< second;
Often I need to mix integers of different sizes - a few one-byte header bytes, then one four-byte value, then a one-byte crc. This << overload automatically takes care of this, for example:
uint32_t value = 0;
std::vector<uint8_t> raw;
raw << value;
int sz = raw.size(); // sz = 4
The operator<< function would look somewhat like the following. Keep in mind that in order to split up into individual bytes I'd either define multiple operator<< overloads, one for each type, or make a std::is_arithmetic restricted template. I'm not showing this for simplicity.
std::vector<uint8_t>& operator << (std::vector<uint8_t>& msg, uint8_t const& value)
{
msg.push_back(value);
return msg;
}
Now I obviously want to restrict this functionality. Not every std::vector<uint8_t> should have this functionality. One solution would be to define the operator<< in namespace serial and whenever the functionality is needed write using namespace serial; in the required scope. While not a bad solution, I still think this is a little confusing. In the same scope I may have a different std::vector<uint8_t> for which this functionality is not needed.
I'd ideally create a new type, Message which allows for this functionality so the code becomes:
Message msg1;
msg1 << 4; // OK, I've defined this
uint8_t* ptr1 = msg1.data(); // get pointer to first element - needs to be defined in the Message class.
std::vector<uint8_t> msg2;
msg2 << 4; // not OK, not part of std::vector<uint8_t>
uint8_t* ptr2 = msg2.data(); // get pointer to first element, fine as it's in std::vector
Composition
Now I could use composition to make this struct, like this:
struct Message
{
std::vector<uint8_t> raw;
};
However that means that whenever you want to call a method of the vector (size(), data(), begin(), etc...) you need to call msg.raw.size(), msg.raw.data(), msg.raw.begin() which isn't particularly elegant (in my opinion). Obviously you can add functions to the Message struct that replicate the original functionality, like:
size_t size() const { return raw.size() };
size_t size() const noexcept { return raw.size() };
However given the size of std::vector that's a lot of functions, not to mention you'd have to change them when std::vector changes. I don't necessary need all functions that std::vector has to offer, but where to draw the line?
Inheritance
As far as I know - you do not, ever, inherit from standard types. Then I saw this answer by Richard Hodges, who seems to have a pretty good reputation, give this as a solution to a different question:
// Edge is now a type, in the global namespace...
struct Edge : std::pair<VertexName, VertexName> {
using std::pair<VertexName, VertexName>::pair;
};
Does this mean I could do the following?
struct Message : std::vector<uint8_t> {
using std::vector<uint8_t>::vector;
};
Message msg1;
msg1 << 8; // works (provided I define the operator<< as shown above for Message)
int sz = msg1.size(); // works as Message is a std::vector, result: 4
std::vector<uint8_t> msg2;
msg2 << 8; // doesn't work, as intended
What about if I want to add a variable to it, so it becomes:
enum class Endian
{
lsb,
msb
};
struct Message : std::vector<uint8_t>
{
using std::vector<uint8_t>::vector;
Endian m_endian;
};
Concrete question: can I do the second approach as it suits my needs best, or will I be in trouble as I inherit from std::vector? Any advice on the best approach would be very much appreciated.

One solution I've used in the past for similar situations is to overload the operator-> to give acces to the underlying object you want to wrap.
struct Message
{
std::vector<uint8_t> raw;
std::vector<uint8_t>* operator->() {
return &raw;
}
};
When you return a pointer from operator-> that pointer will in turn also get dereferenced, so you can access any native vector functions using that.
Message m;
std::cout << m->data() << m->size();
And if you add your own methods you can access them as usual.
m.myOwnMethod();
You can also add your overloads of operator<< for Message.

Is it worth using bit shifting to store multiple small data members in a single byte?

In C++, the smallest size of any object or primitive data type is 1 byte. However, I frequently use enumerated types with only a few possible values. It recently came up in a project for one of my courses that I had to store many structs containing two different, small enumerated types. So, of course, I made the underlying type of the enumerated types unsigned chars, and made the structs each 2 bytes.
However, since each enumerated type had far fewer than 16 possible values, I realized I could use bit shifting to store them in only 1 byte.
Here is something like what I'm talking about:
enum utensil : unsigned char {fork, spoon, spork};
enum dish : unsigned char {plate, bowl, box};
enum food : unsigned char {soup, salad, entree};
enum dessert: unsigned char {cake, ice_cream, fudge};
/* A class containing one of each of the four enums
above, but only taking up 1 byte of memory */
class TakeOut{
private:
unsigned char data = 0;
void clear_utensil(){
data = data & 0b00111111;
}
void clear_dish(){
data = data & 0b11001111;
}
void clear_food(){
data = data & 0b11110011;
}
void clear_dessert(){
data = data & 0b11111100;
}
public:
utensil get_utensil() const{
return utensil((data & 11000000) >> 6);
}
dish get_dish() const{
return dish((data & 00110000) >> 4);
}
food get_food() const{
return food((data & 00001100) >> 2);
}
dessert get_dessert() const{
return dessert(data & 00000011);
}
void set_utensil(utensil in){
clear_utensil();
data = data | ((unsigned char)(in) << 6);
}
void set_dish(dish in){
clear_dish();
data = data | ((unsigned char)(in) << 4);
}
void set_food(utensil in){
clear_food();
data = data | ((unsigned char)(in) << 2);
}
void set_dessert(utensil in){
clear_dessert();
data = data | (unsigned char)(in);
}
};
Should I avoid doing this on 'real' projects if the opportunity ever presents itself again? It's complicated, sure, but if I have to store a lot of TakeOut objects, maybe it's worth it for a small sacrifice of time in accessing data members.

You do not need to use bit-shifting for this, you might use bit-fields in structures. This gives you the same functionality with significantly less typing (which means less maintenance, less error prone and more fun!):
enum utensil_t : unsigned char {fork, spoon, spork};
enum dish_t : unsigned char {plate, bowl, box};
enum food_t : unsigned char {soup, salad, entree};
enum dessert_t: unsigned char {cake, ice_cream, fudge};
struct TakeOut {
utensil_t utensil : 2;
dish_t dish : 2;
food_t food : 2;
dessert_t dessert : 2;
};
However, please note that when you are doing this, you are trading performance for size, and most likely you do not want this trade. Unless you are dealing with a lot of those in a very constrained environment.
Language-lawyer note: Technically, C++ compiler is not required to pack bitfields, as their layout is implementation defined. Practically I do not think there is any implementation in practice which doesn't pack bitfields. You can easily protect yourself from insane implementation with:
static_assert(sizeof(TakeOut) == 1, "Sanity, please!");

C++ variable length arrays in struct

I am writing a program for creating, sending, receiving and interpreting ARP packets. I have a structure representing the ARP header like this:
struct ArpHeader
{
unsigned short hardwareType;
unsigned short protocolType;
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
unsigned short operationCode;
unsigned char senderHardwareAddress[6];
unsigned char senderProtocolAddress[4];
unsigned char targetHardwareAddress[6];
unsigned char targetProtocolAddress[4];
};
This only works for hardware addresses with length 6 and protocol addresses with length 4. The address lengths are given in the header as well, so to be correct the structure would have to look something like this:
struct ArpHeader
{
unsigned short hardwareType;
unsigned short protocolType;
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
unsigned short operationCode;
unsigned char senderHardwareAddress[hardwareAddressLength];
unsigned char senderProtocolAddress[protocolAddressLength];
unsigned char targetHardwareAddress[hardwareAddressLength];
unsigned char targetProtocolAddress[protocolAddressLength];
};
This obviously won't work since the address lengths are not known at compile time. Template structures aren't an option either since I would like to fill in values for the structure and then just cast it from (ArpHeader*) to (char*) in order to get a byte array which can be sent on the network or cast a received byte array from (char*) to (ArpHeader*) in order to interpret it.
One solution would be to create a class with all header fields as member variables, a function to create a byte array representing the ARP header which can be sent on the network and a constructor which would take only a byte array (received on the network) and interpret it by reading all header fields and writing them to the member variables. This is not a nice solution though since it would require a LOT more code.
In contrary a similar structure for a UDP header for example is simple since all header fields are of known constant size. I use
#pragma pack(push, 1)
#pragma pack(pop)
around the structure declaration so that I can actually do a simple C-style cast to get a byte array to be sent on the network.
Is there any solution I could use here which would be close to a structure or at least not require a lot more code than a structure?
I know the last field in a structure (if it is an array) does not need a specific compile-time size, can I use something similar like that for my problem? Just leaving the sizes of those 4 arrays empty will compile, but I have no idea how that would actually function. Just logically speaking it cannot work since the compiler would have no idea where the second array starts if the size of the first array is unknown.

You want a fairly low level thing, an ARP packet, and you are trying to find a way to define a datastructure properly so you can cast the blob into that structure. Instead, you can use an interface over the blob.
struct ArpHeader {
mutable std::vector<uint8_t> buf_;
template <typename T>
struct ref {
uint8_t * const p_;
ref (uint8_t *p) : p_(p) {}
operator T () const { T t; memcpy(&t, p_, sizeof(t)); return t; }
T operator = (T t) const { memcpy(p_, &t, sizeof(t)); return t; }
};
template <typename T>
ref<T> get (size_t offset) const {
if (offset + sizeof(T) > buf_.size()) throw SOMETHING;
return ref<T>(&buf_[0] + offset);
}
ref<uint16_t> hwType() const { return get<uint16_t>(0); }
ref<uint16_t> protType () const { return get<uint16_t>(2); }
ref<uint8_t> hwAddrLen () const { return get<uint8_t>(4); }
ref<uint8_t> protAddrLen () const { return get<uint8_t>(5); }
ref<uint16_t> opCode () const { return get<uint16_t>(6); }
uint8_t *senderHwAddr () const { return &buf_[0] + 8; }
uint8_t *senderProtAddr () const { return senderHwAddr() + hwAddrLen(); }
uint8_t *targetHwAddr () const { return senderProtAddr() + protAddrLen(); }
uint8_t *targetProtAddr () const { return targetHwAddr() + hwAddrLen(); }
};
If you need const correctness, you remove mutable, create a const_ref, and duplicate the accessors into non-const versions, and make the const versions return const_ref and const uint8_t *.

Short answer: you just cannot have variable-sized types in C++.
Every type in C++ must have a known (and stable) size during compilation. IE operator sizeof() must give a consistent answer. Note, you can have types that hold variable amount of data (eg: std::vector<int>) by using the heap, yet the size of the actual object is always constant.
So, you can never produce a type declaration that you would cast and get the fields magically adjusted. This goes deeply into the fundamental object layout - every member (aka field) must have a known (and stable) offset.
Usually, the issue have is solved by writing (or generating) member functions that parse the input data and initialize the object's data. This is basically the age-old data serialization problem, which has been solved countless times in the last 30 or so years.
Here is a mockup of a basic solution:
class packet {
public:
// simple things
uint16_t hardware_type() const;
// variable-sized things
size_t sender_address_len() const;
bool copy_sender_address_out(char *dest, size_t dest_size) const;
// initialization
bool parse_in(const char *src, size_t len);
private:
uint16_t hardware_type_;
std::vector<char> sender_address_;
};
Notes:
the code above shows the very basic structure that would let you do the following:
packet p;
if (!p.parse_in(input, sz))
return false;
the modern way of doing the same thing via RAII would look like this:
if (!packet::validate(input, sz))
return false;
packet p = packet::parse_in(input, sz); // static function
// returns an instance or throws

If you want to keep access to the data simple and the data itself public, there is a way to achieve what you want without changing the way you access data. First, you can use std::string instead of the char arrays to store the addresses:
#include <string>
using namespace std; // using this to shorten notation. Preferably put 'std::'
// everywhere you need it instead.
struct ArpHeader
{
unsigned char hardwareAddressLength;
unsigned char protocolAddressLength;
string senderHardwareAddress;
string senderProtocolAddress;
string targetHardwareAddress;
string targetProtocolAddress;
};
Then, you can overload the conversion operator operator const char*() and the constructor arpHeader(const char*) (and of course operator=(const char*) preferably too), in order to keep your current sending/receiving functions working, if that's what you need.
A simplified conversion operator (skipped some fields, to make it less complicated, but you should have no problem in adding them back), would look like this:
operator const char*(){
char* myRepresentation;
unsigned char mySize
= 2+ senderHardwareAddress.length()
+ senderProtocolAddress.length()
+ targetHardwareAddress.length()
+ targetProtocolAddress.length();
// We need to store the size, since it varies
myRepresentation = new char[mySize+1];
myRepresentation[0] = mySize;
myRepresentation[1] = hardwareAddressLength;
myRepresentation[2] = protocolAddressLength;
unsigned int offset = 3; // just to shorten notation
memcpy(myRepresentation+offset, senderHardwareAddress.c_str(), senderHardwareAddress.size());
offset += senderHardwareAddress.size();
memcpy(myRepresentation+offset, senderProtocolAddress.c_str(), senderProtocolAddress.size());
offset += senderProtocolAddress.size();
memcpy(myRepresentation+offset, targetHardwareAddress.c_str(), targetHardwareAddress.size());
offset += targetHardwareAddress.size();
memcpy(myRepresentation+offset, targetProtocolAddress.c_str(), targetProtocolAddress.size());
return myRepresentation;
}
While the constructor can be defined as such:
ArpHeader& operator=(const char* buffer){
hardwareAddressLength = buffer[1];
protocolAddressLength = buffer[2];
unsigned int offset = 3; // just to shorten notation
senderHardwareAddress = string(buffer+offset, hardwareAddressLength);
offset += hardwareAddressLength;
senderProtocolAddress = string(buffer+offset, protocolAddressLength);
offset += protocolAddressLength;
targetHardwareAddress = string(buffer+offset, hardwareAddressLength);
offset += hardwareAddressLength;
targetProtocolAddress = string(buffer+offset, protocolAddressLength);
return *this;
}
ArpHeader(const char* buffer){
*this = buffer; // Re-using the operator=
}
Then using your class is as simple as:
ArpHeader h1, h2;
h1.hardwareAddressLength = 3;
h1.protocolAddressLength = 10;
h1.senderHardwareAddress = "foo";
h1.senderProtocolAddress = "something1";
h1.targetHardwareAddress = "bar";
h1.targetProtocolAddress = "something2";
cout << h1.senderHardwareAddress << ", " << h1.senderProtocolAddress
<< " => " << h1.targetHardwareAddress << ", " << h1.targetProtocolAddress << endl;
const char* gottaSendThisSomewhere = h1;
h2 = gottaSendThisSomewhere;
cout << h2.senderHardwareAddress << ", " << h2.senderProtocolAddress
<< " => " << h2.targetHardwareAddress << ", " << h2.targetProtocolAddress << endl;
delete[] gottaSendThisSomewhere;
Which should offer you the utility needed, and keep your code working without changing anything out of the class.
Note however that if you're willing to change the rest of the code a bit (talking here about the one you've written already, ouside of the class), jxh's answer should work as fast as this, and is more elegant on the inner side.

Simplest way to read binary data from a std::vector<unsigned char>?

I have a lump of binary data in the form of const std::vector<unsigned char>, and want to be able to extract individual fields from that, such as 4 bytes for an integer, 1 for a boolean, etc. This needs to be, as far as possible, both efficient and simple. eg. It should be able to read the data in place without needing to copy it (eg. into a string or array). And it should be able to read one field at a time, like a parser, since the lump of data does not have a fixed format. I already know how to determine what type of field to read in each case - the problem is getting a usable interface on top of an std::vector for doing this.
However I can't find a simple way to get this data into an easily usable form that gives me useful read functionality. eg. std::basic_istringstream<unsigned char> gives me a reading interface, but it seems like I need to copy the data into a temporary std::basic_string<unsigned char> first, which is not idea for bigger blocks of data.
Maybe there is some way I can use a streambuf in this situation to read the data in place, but it would appear that I'd need to derive my own streambuf class to do that.
It occurs to me that I can probably just use sscanf on the vector's data(), and that would seem to be both more succinct and more efficient than the C++ standard library alternatives. EDIT: Having been reminded that sscanf doesn't do what I wrongly thought it did, I actually don't know a clean way to do this in C or C++. But am I missing something, and if so, what?

You have access to the data in a vector through its operator[]. A vector's data is guranteed to be stored in a single contiguous array, and [] returns a reference to a member of that array. You may use that reference directly, or through a memcpy.
std::vector<unsigned char> v;
...
byteField = v[12];
memcpy(&intField, &v[13], sizeof intField);
memcpy(charArray, &v[20], lengthOfCharArray);
EDIT 1:
If you want something "more convenient" that that, you could try:
template <class T>
ReadFromVector(T& t, std::size_t offset,
const std::vector<unsigned char>& v) {
memcpy(&t, &v[offset], sizeof(T));
}
Usage would be:
std::vector<unsigned char> v;
...
char c;
int i;
uint64_t ull;
ReadFromVector(c, 17, v);
ReadFromVector(i, 99, v);
ReadFromVector(ull, 43, v);
EDIT 2:
struct Reader {
const std::vector<unsigned char>& v;
std::size_t offset;
Reader(const std::vector<unsigned char>& v) : v(v), offset() {}
template <class T>
Reader& operator>>(T&t) {
memcpy(&t, &v[offset], sizeof t);
offset += sizeof t;
return *this;
}
void operator+=(int i) { offset += i };
char *getStringPointer() { return &v[offset]; }
};
Usage:
std::vector<unsigned char> v;
Reader r(v);
int i; uint64_t ull;
r >> i >> ull;
char *companyName = r.getStringPointer();
r += strlen(companyName);

If your vector stores binary data, you can't use sscanf or similar, they work on text.
For converting a byte for a bool is simple enough
bool b = my_vec[10];
For extracting an unsigned int that's stored in big endian order (assuming your ints are 32 bits):
unsigned int i = my_vec[10] << 24 | my_vec[11] << 16 | my_vec[12] << 8 | my_vec[13];
A 16 bit unsigned short would be similar:
unsigned short s = my_vec[10] << 8 | my_vec[11];¨

If you can afford the Qt dependency, QByteArray has the fromRawData() named constructor, which wraps existing data buffers in a QByteArray without copying the data. With that byte array, you can the feed a QTextStream.
I'm not aware of any such function in the standard streams library (short of implementing your own streambuf, of course), but I'd love to be proved wrong :)

You can use a struct that describes the data you are trying to extract. You can move data from your vector into the struct like this:
struct MyData {
int intVal;
bool boolVal;
char[15] stringVal;
} __attribute__((__packed__));
// assuming all extracted types are prefixed with a one byte indicator.
// Also assumes "vec" is your populated vector
int pos = 0;
while (pos < vec.size()-1) {
switch(vec[pos++]) {
case 0: { // handle int
int intValue;
memcpy(&vec[pos], &intValue, sizeof(int));
pos += sizeof(int);
// do something with handled value
break;
}
case 1: { // handle double
double doubleValue;
memcpy(&vec[pos], &doubleValue, sizeof(double));
pos += sizeof(double);
// do something with handled value
break;
}
case 2: { // handle MyData
struct MyData data;
memcpy(&vec[pos], &data, sizeof(struct MyData));
pos += sizeof(struct MyData);
// do something with handled value
break;
}
default: {
// ERROR: unknown type indicator
break;
}
}
}

Use a for loop to iterate over the vector and use bitwise operators to access each bit group. For example, to access the upper four bits of the first usigned char in your vector:
int myInt = vec[0] & 0xF0;
To read the fifth bit from the right, right after the chunk we just read:
bool myBool = vec[0] & 0x08;
The three least significant (lowest) bits can be accesed like so:
int myInt2 = vec[0] & 0x07;
You can then repeat this process (using a for loop) for every element in your vector.

Variable sized packet structs with vectors

Lately I've been diving into network programming, and I'm having some difficulty constructing a packet with a variable "data" property. Several prior questions have helped tremendously, but I'm still lacking some implementation details. I'm trying to avoid using variable sized arrays, and just use a vector. But I can't get it to be transmitted correctly, and I believe it's somewhere during serialization.
Now for some code.
Packet Header
class Packet {
public:
void* Serialize();
bool Deserialize(void *message);
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
};
Packet ImpL
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
std::vector<char> data;
} Packet;
void* Packet::Serialize(int size) {
Packet* p = (Packet *) malloc(8 + 30);
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->data.assign(size,'&'); //just for testing purposes
}
bool Packet::Deserialize(void *message) {
Packet *s = (Packet*)message;
this->sender_id = ntohl(s->sender_id);
this->sequence_number = ntohl(s->sequence_number);
this->data = s->data;
}
During execution, I simply create a packet, assign it's members, and send/receive accordingly. The above methods are only responsible for serialization. Unfortunately, the data never gets transferred.
Couple of things to point out here. I'm guessing the malloc is wrong, but I'm not sure how else to compute it (i.e. what other value it would be). Other than that, I'm unsure of the proper way to use a vector in this fashion, and would love for someone to show me how (code examples please!) :)
Edit: I've awarded the question to the most comprehensive answer regarding the implementation with a vector data property. Appreciate all the responses!

This trick works with a C-style array at the end of the struct, but not with a C++ vector. There is no guarantee that the C++ vector class will (and it most likely won't) put its contained data in the "header object" that is present in the Packet struct. Instead, that object will contain a pointer to somewhere else, where the actual data is stored.

i think you might want to do like this:
`
struct PacketHeader
{
unsigned int senderId;
unsigned int sequenceNum;
};
class Packet
{
protected:
PacketHeader header;
std::vector<char> data;
public:
char* serialize(int& packetSize);
void deserialize(const char* data,int dataSize);
}
char* Packet::serialize(int& packetSize)
{
packetSize = this->data.size()+sizeof(PacketHeader);
char* packetData = new char[packetSize];
PacketHeader* packetHeader = (PacketHeader*)packetData;
packetHeader->senderId = htonl(this->header.senderId);
packetHeader->sequenceNum = htonl(this->header.sequenceNum);
char* packetBody = (packetData + sizeof(packetHeader));
for(size_t i=0 ; i<this->data.size() ; i++)
{
packetBody[i] = this->data.at(i);
}
return packetData;
}
void deserialize(const char* data,int dataSize)
{
PacketHeader* packetHeader = (PacketHeader*)data;
this->header.senderId = ntohl(packetHeader->senderId);
this->header.sequenceNum = ntohl(packetHeader->sequenceNum);
this->data.clear();
for(int i=sizeof(PacketHeader) ; i<dataSize ; i++)
{
this->data.push_back(data[i]);
}
}
`
those codes does not include bound checking and free allocated data, don't forget to delete the returned buffer from serialize() function, and also you can use memcpy instead of using loop to copy byte per byte into or from std::vector.
most compiler sometime add padding inside a structure, this would cause an issue if you send those data intact without disable the padding, you can do this by using #pragma pack(1) if you are using visual studio
disclaimer: i don't actually compile those codes, you might want to recheck it

I think the problem centres around you trying the 'serialise' the vector that way and you're probably assuming that the vector's state information gets transmitted. As you've found, that doesn't really work that way as you're trying to move an object across the network and things like pointers etc don't mean anything on the other machine.
I think the easiest way to handle this would be to change Packet to the following structure:
struct Packet {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int vector_size;
char data[1];
};
The data[1] bit is an old C trick for variable length array - it has to be the last element in the struct as you're essentially writing past the size of the struct. You have to get the allocation for the data structure right for this, otherwise you'll be in a world of hurt.
Your serialisation function then looks something like this:
void* Packet::Serialize(std::vector<char> &data) {
Packet* p = (Packet *) malloc(sizeof(Packet) + data.size());
p->sender_id = htonl(this->sender_id);
p->sequence_number = htonl(this->sequence_number);
p->vector_size = htonl(data.size());
::memcpy(p->data, data[0], size);
}
As you can see, we'll transmit the data size and the contents of the vector, copied into a plain C array which transmits easily. You have to keep in mind that in your network sending routine, you have to calculate the size of the structure properly as you'll have to send sizeof(Packet) + sizeof(data), otherwise you'll get the vector cut off and are back into nice buffer overflow territory.
Disclaimer - I haven't tested the code above, it's just written from memory so you might have to fix the odd compilation error.

I think you need to work directly with byte arrays returned by the socket functions.
For these purposes it's good to have two distinct parts of a message in your protocol. The first part is a fixed-size "header". This will include the size of the byes that follow, the "payload", or, data in your example.
So, to borrow some of your snippets and expand on them, maybe you'll have something like this:
typedef struct {
unsigned int sender_id;
unsigned int sequence_number;
unsigned int data_length; // this is new
} PacketHeader;
So then when you get a buffer in, you'll treat it as a PacketHeader*, and check data_length to know how much bytes will appear in the byte vector that follows.
I would also add a few points...
Making these fields unsigned int is not wise. The standards for C and C++ don't specify how big int is, and you want something that will be predictable on all compilers. I suggest the C99 type uint32_t defined in <stdint.h>
Note that when you get bytes from the socket... It is in no way guaranteed to be the same size as what the other end wrote to send() or write(). You might get incomplete messages ("packets" in your terminology), or you might get multiple ones in a single read() or recv() call. It's your responsibility to buffer these if they are short of a single request, or loop through them if you get multiple requests in the same pass.

This cast is very dangerous as you have allocated some raw memory and then treated it as an initialized object of a non-POD class type. This is likely to cause a crash at some point.
Packet* p = (Packet *) malloc(8 + 30);
Looking at your code, I assume that you want to write out a sequence of bytes from the Packet object that the seralize function is called on. In this case you have no need of a second packet object. You can create a vector of bytes of the appropriate size and then copy the data across.
e.g.
void* Packet::Serialize(int size)
{
char* raw_data = new char[sizeof sender_id + sizeof sequence_number + data.size()];
char* p = raw_data;
unsigned int tmp;
tmp = htonl(sender_id);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
tmp = htonl(sequence_number);
std::memcpy(p, &tmp, sizeof tmp);
p += sizeof tmp;
std::copy(data.begin(), data.end(), p);
return raw_data;
}
This may not be exactly what you intended as I'm not sure what the final object of your size parameter is and your interface is potentially unsafe as you return a pointer to raw data that I assume is supposed to be dynamically allocated. It is much safer to use an object that manages the lifetime of dynamically allocated memory then the caller doesn't have to guess whether and how to deallocate the memory.
Also the caller has no way of knowing how much memory was allocated. This may not matter for deallocation but presumably if this buffer is to be copied or streamed then this information is needed.
It may be better to return a std::vector<char> or to take one by reference, or even make the function a template and use an output iterator.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

stream data into a byte array - c++

Have you tried std::list? You could group the elements into types and put them into lists with the appropriate type. Then you could create an std::list of std::lists.

Related

Extending the functionality of std::vector<uint8_t> - namespace, composition or inheritance?

Is it worth using bit shifting to store multiple small data members in a single byte?

C++ variable length arrays in struct

Simplest way to read binary data from a std::vector<unsigned char>?

Variable sized packet structs with vectors

Categories

Resources