I am currently developing some software in C++ where I am sending and receiving custom data packets. I want to parse and manage these packets in a well structured manner. Obviously I am first receiving the header and after that the body of the data. The main problem is that I don't like creating a Packet-Object with only the header information and later on adding the body data. What is an elegant way of parsing and storing custom data packets?
Here is a rough sketch of what such a custom data packet could look like:
+-------+---------+---------+----------+------+
| Magic | Command | Options | Bodysize | Body |
+-------+---------+---------+----------+------+
(Lets assume Magic is 4 bytes, Command 1 byte, Options 2 bytes, Bodysize 4 bytes and the body itself is variable in length.)
How would I parse this without using any third party libraries?
Normally I'd say something like this could be done to store packet data:
#include <array>
class Packet {
public:
explicit Packet(std::array<char, 10> headerbytes);
void set_body(std::vector<char> data);
std::vector<char> get_body();
int8_t get_command();
int16_t get_options();
bool is_valid();
private:
bool valid;
int8_t _command;
int16_t _options;
int32_t body_size;
std::vector<char> _data;
};
The problem is that I provide the header-information first and than add the body data in a hacky way later on. The packet object has a point of time where it is accessible in an incomplete state.
I first receive the header and after the header was received another receive call is made to read the body.
Would it make sense to have a parser instance that populates information into the packet object only make it accessible once it holds all needed information? Would it make sense to have a separate class for the header and the body? What would be the best design choice?
I am developing with C++ and for the sending and receiving of data over sockets the boost library is used.
If you don’t want to tie the data reading into one complete constructor (for understandable reasons of separation of concerns), this is a good application for non-polymorphic inheritance:
struct Header {
static constexpr SIZE=10;
Header(std::array<char,SIZE>);
std::int8_t get_command() const {return command;}
std::int16_t get_options() const {return options;}
std::int32_t body_size() const {return length;}
private:
std::int8_t command;
std::int16_t options;
std::int32_t length;
};
struct Packet : private Header {
using Body=std::vector<char>;
Packet(const Header &h,Body b) : Header(h),body(std::move(b))
{if(body.size()!=body_size()) throw …;}
using Header::get_command;
using Header::get_options;
const Body& get_body() const {return body;}
private:
Body body;
};
// For some suitable Stream class:
Header read1(Stream &s)
{return {s.read<Header::SIZE>()};}
Packet read2(const Header &h,Stream &s)
{return {h,s.read(h.body_size())};}
Packet read(Stream &s)
{return read2(read1(s),s);}
Note that the private inheritance prevents undefined behavior from deleting a Packet via a Header*, as well as the surely-unintended
const Packet p=read(s);
const Packet q=read2(p,s); // same header?!
Composition would of course work as well, but might result in more adapter code in a full implementation.
If you were really optimizing, you could make a HeaderOnly without the body size and derive Header and Packet from that.
For this case I would use the pipeline design pattern creating 3 packet processor classes:
Command (handles magic bytes too)
Options
Body (handles body size too)
all derived from one base class.
typedef unsigned char byte;
namespace Packet
{
namespace Processor
{
namespace Field
{
class Item
{
public:
/// Returns true when the field was fully processed, false otherwise.
virtual bool operator () (const byte*& begin, const byte* const end) = 0;
};
class Command: public Item
{
public:
virtual bool operator () (const byte*& begin, const byte* const end);
};
class Options: public Item
{
public:
virtual bool operator () (const byte*& begin, const byte* const end);
};
class Body: public Item
{
public:
virtual bool operator () (const byte*& begin, const byte* const end);
};
}
class Manager
{
public:
/// Called every time new data is received
void operator () (const byte* begin, const byte* const end)
{
while((*fields[index])(begin, end))
{
incrementIndex();
}
}
protected:
void incrementIndex();
Field::Command command;
Field::Options options;
Field::Body body;
Field::Item* const fields[3] = { &command, &options, &body };
byte index;
};
}
}
You can use exceptions to prevent creation of incomplete packet objects.
I'd use char pointers instead of vectors for performance.
// not intended to be inherited
class Packet final {
public:
Packet(const char* data, unsigned int data_len) {
if(data_len < header_len) {
throw std::invalid_argument("data too small");
}
const char* dataIter = data;
if(!check_validity(dataIter)) {
throw std::invalid_argument("invalid magic word");
}
dataIter += sizeof(magic);
memcpy(&command, dataIter, sizeof(command)); // can use cast & assignment, too
dataIter += sizeof(command);
memcpy(&options, dataIter, sizeof(options)); // can use cast & assignment, too
dataIter += sizeof(options);
memcpy(&body_size, dataIter, sizeof(body_size)); // can use cast & assignment, too
dataIter += sizeof(body_size);
if( data_len < body_size+header_len) {
throw std::invalid_argument("data body too small");
}
body = new char[body_size];
memcpy(body, dataIter, body_size);
}
~Packet() {
delete[] body;
}
int8_t get_command() const {
return command;
}
int16_t get_options() const {
return options;
}
int32_t get_body_size() const {
return body_size;
}
const char* get_body() const {
return body;
}
private:
// assumes len enough, may add param in_len for robustness
static bool check_validity(const char* in_magic) {
return ( 0 == memcmp(magic, in_magic, sizeof(magic)) );
}
constexpr static char magic[] = {'a','b','c','d'};
int8_t command;
int16_t options;
int32_t body_size;
char* body;
constexpr static unsigned int header_len = sizeof(magic) + sizeof(command)
+ sizeof(options) + sizeof(body_size);
};
Note: this is my first post in SO, so please let me know if something's wrong with the post, thanks.
I'm guessing you are trying Object-oriented networking. If so, the best solution for such parsing would be Flatbuffers or Cap’n Proto C++ code generator. By defining a schema, you will get state machine code that will parse the packets in an efficient and safe way.
Related
I am writing some kind of buffer parser that takes vector of unsigned char bytes as an input,for example
Datatype getvalue(vector<unsigned char> buffer)
{
// compute value
If vector contains 2 bytes then unsigned int will be returned
If vector contains 4 bytes then unsigned long will be returned
If 12 bytes then date time will
be returns
return value;
}
You cannot do this.
A function can only return one type. You could use some sort of type erasure (std::variant/std::any and others) but as you are bound to C++11, you can resort to the following: Instead of returning the value from the function pass it to a function...
struct Consumer {
void operator()(int x) { ... }
void operator()(float x) { ... }
void operator()(time_t t) { ... }
};
void getValue(Consumer& c,vector<unsigned char> buffer) {
if (...) {
int data;
c(data);
} else if (...) {
float data;
c(data);
} else if (...) {
time_t data;
c(data);
}
}
I will store these values in a vector and that do some computation on each value and then will concat thses values to generate an output string
A vector can only store int or float, so we are back at step zero. However, to add them to a string all you need is a stringstream:
void getValue(std::stringstream& s,vector<unsigned char> buffer) {
if (...) {
int data;
s << data;
} else if (...) {
float data;
s << data;
} else if (...) {
time_t data;
s << data;
}
}
You might want to use some tagged union type.
A C++ function has one return type, not several of them.
In C++17 consider using the std::variant template.
Or code your own implementation, using some union in your class. Then follow the C++ rule of five (even in C++11).
Read a good C++ programming book for more, and the C++11 standard n3337.
See also this C++ reference website.
If you compile with a recent GCC, enable all warnings and debug info, so use g++ -Wall -Wextra -g then GDB and perhaps valgrind and the address sanitizer or the Clang static analyzer.
Look for inspiration into the source code of existing open source projects on github or gitlab (e.g. FLTK, Qt, fish, Boost, etc... and those mentioned above)
Perhaps consider passing (as a second or more arguments to your getvalue) one or several lambda expression (so practically some std::function) processing the results of different type. Think in terms of callbacks and contination passing style using more a functional programming paradigm.
You are not allowed to return different data types from a single function. However, you can achieve something similar using polymorphism. If all your return types are derived from a single base class, you can make your function return a pointer to that base class, which in turn may point to any one of the derived classes. For example:
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
using namespace std;
class Datatype
{
public:
virtual string gettype() = 0;
virtual void performOp() = 0;
virtual stringstream& concat(stringstream&) = 0;
virtual ~Datatype() {};
};
class UnsignedInt : public Datatype
{
unsigned int val;
string type { "unsigned int" };
public:
UnsignedInt(unsigned int v): val(v) {}
string gettype() {return type;}
void performOp() { /*perfrom uint specific operations here eg. val = func1(val); */}
stringstream& concat(stringstream& out) {out<<val; return out;}
unsigned int getval() {return val;}
~UnsignedInt() {}
};
class UnsignedLong : public Datatype
{
unsigned long val;
string type { "unsigned long" };
public:
UnsignedLong(unsigned long v): val(v) {}
string gettype() {return type;}
void performOp() { /*perfrom ulong specific operations here eg. val = func2(val); */}
stringstream& concat(stringstream& out) {out<<val; return out;}
unsigned long getval() {return val;}
~UnsignedLong() {}
};
Datatype* getvalue(vector<unsigned char> buffer)
{
if(buffer.size() == 2)
{
// some logic
return new UnsignedInt(23);
}
else if(buffer.size() == 4)
{
//some logic
return new UnsignedLong(2564);
}
else
return nullptr;
}
int main()
{
vector<unsigned char> vec{'2' , '3'};
stringstream out;
Datatype *val = getvalue(vec);
val->performOp(); // perfroms uint specific operation
val->concat(out);
out<<',';
vector<unsigned char> vec1{'2' , '5' , '6' , '4'};
val = getvalue(vec1);
val->performOp(); // perfroms ulong specific operation
val->concat(out);
cout<<out.str()<<endl;
delete val;
return 0;
}
For protocol buffers in C++, I am wondering if it is better to contain a protobuf message in my class, or to have it be constructed from and populate an external protobuf message.
I could not find examples describing best practices for this case. I'm particular worried about performance differences between the two designs.
In my processing, I will have some cases where I am going to read only a few fields from my message and then route the message to another process (possibly manipulating the message before sendind it back out), and other cases where my objects will have a long lifetime and be used many times before being serialized again. In the first case, I could likely operate directly on the protobuf message and not even need my class, execpt to fit into an existing interface.
Here is an example message:
package example;
message Example {
optional string name = 1;
optional uint32 source = 2;
optional uint32 destination = 3;
optional uint32 value_1 = 4;
optional uint32 value_2 = 5;
optional uint32 value_3 = 6;
}
I could see one of the following designs for my class. I know these classes aren't doing anything else but accessing data, but that's not what I'm trying to focus on for this question.
Composition
class Widget
{
public:
Widget() : message_() {}
Widget(const example::Example& other_message)
: message_(other_message) {}
const example::Example& getMessage() const
{ return message_; }
void populateMessage(example::Example& message) const
{ message = message_; }
// Some example inspectors filled out...
std::string getName() const
{ return message_.name(); }
uint32_t getSource() const;
{ return message_.source(); }
uint32_t getDestination() const;
uint32_t getValue1() const;
uint32_t getValue2() const;
uint32_t getValue3() const;
// Some example mutators filled out...
void setName(const std::string& new_name)
{ message_.set_name(new_name); }
void setSource(uint32_t new_source);
{ message_.set_source(new_source); }
void setDestination(uint32_t new_destination);
void setValue1(uint32_t new_value);
void setValue2(uint32_t new_value);
void setValue3(uint32_t new_value);
private:
example::Example message_;
};
Standard data members
class Widget
{
public:
Widget();
Widget(const example::Example& other_message)
: name_(other_message.name()),
source_(other_message.source()),
destination_(other_message.destination()),
value_1_(other_messsage.value_1()),
value_2_(other_messsage.value_2()),
value_3_(other_messsage.value_3())
{}
example::Example getMessage() const
{
example::Example message;
populateMessage(message);
return message;
}
void populateMessage(example::Example& message) const
{
message.set_name(name_);
message.set_source(source_);
message.set_value_1(value_1_);
message.set_value_2(value_2_);
message.set_value_3(value_3_);
}
// Some example inspectors filled out...
std::string getName() const
{ return name_; }
uint32_t getSource() const;
{ return source_; }
uint32_t getDestination() const;
uint32_t getValue1() const;
uint32_t getValue2() const;
uint32_t getValue3() const;
// Some example mutators filled out...
void setName(const std::string& new_name)
{ name_ = new_name; }
void setSource(uint32_t new_source);
{ source_ = new_source; }
void setDestination(uint32_t new_destination);
void setValue1(uint32_t new_value);
void setValue2(uint32_t new_value);
void setValue3(uint32_t new_value);
private:
std::string name_;
uint32_t source_;
uint32_t destination_;
uint32_t value_1_;
uint32_t value_2_;
uint32_t value_3_;
};
There is no recognized "best practice" here. I have seen plenty of examples of both, and even written programs that worked both ways. Some people have very strong opinions about this, but in my opinion it depends on the use case. For example, as you say, if you plan to forward most of the data to another server, then it makes a lot of sense to keep the protobuf object around. But other times you have a more convenient internal representation -- for example, before protobufs added native support for maps, if you had a protobuf that represented a map as a repeated list of key/value pairs, you might want to convert it to an std::map upfront.
I have a code, which writes a number to std::string using std::ostringstream:
template<class T>
class Converter
{
private:
static std::string s_buffer;
public:
static const char* Out(const T& val)
{
std::ostringstream os;
os << val;
s_buffer = os.str();
return(s_buffer.data());
}
};
The Converter::Out is called a lot. So much that it even shows up in the profiler. And essentially, what happens here is:
An instance of ostringstream is created
It creates a buffer to write to and writes to it
I copy that buffer to the static string and return it
I think, that if I could get the stream to write directly to the static string, thus avoiding the copy, I may get some performance improvement. But how can I do it - std::ostringstream can accept only const std::string in constructor, which would be a preliminary fill, not the buffer to write to.
Maybe Boost has some alternative, though I didn't find one... :(
You can access the buffer of an ostringstream using the rdbuf() method; unfortunately, access to the underlying character buffer is protected. However, you can easily work around that via inheritance:
template<class T>
class Converter
{
private:
static struct Buf : public std::ostringstream, public std::basic_stringbuf<char>
{
Buf() { static_cast<std::basic_ios<char>&>(*this).rdbuf(this); }
void clear() { setp(pbase(), pbase()); }
char const* c_str() { *pptr() = '\0'; return pbase(); }
} s_buf;
public:
static const char* Out(const T& val)
{
s_buf.clear();
s_buf << val;
return s_buf.c_str();
}
};
If Boost is an option, you can use boost::iostreams::filtering_ostream backed by a string or vector<char>: http://lists.boost.org/boost-users/2012/09/75887.php
I am trying to define a packet whose length is determined during an ns-3 simulation (think of it as a packet sent on the downlink containing schedule information whose length depends on the number of nodes in the network which can join/leave the network during simulation). Does anyone have any idea how I could approach this?
The traditional solution is to send the length first, followed by the data:
+------------+---------------------+
| uint32_t n | n - 4 bytes of data |
+------------+---------------------+
To decode, read the first four bytes, and then use the value in those bytes to determine how much more data there is.
The simplest introduction on how to do this is http://www.nsnam.org/support/faq/miscellaneous/#table
In practice, if you want to extend this code to store a variable-sized data structure, you can do this:
class MyHeader : public Header
{
public:
// new methods
void AppendData (uint16_t data);
std::vector<uint16_t> GetData (void) const;
static TypeId GetTypeId (void);
// overridden from Header
virtual uint32_t GetSerializedSize (void) const;
virtual void Serialize (Buffer::Iterator start) const;
virtual uint32_t Deserialize (Buffer::Iterator start);
virtual void Print (std::ostream &os) const;
private:
std::vector<uint16_t> m_data;
};
I will skip the obvious GetData/AppendData methods. Instead, we can focus on the Serialize/Deserialize methods:
uint32_t
MyHeader::GetSerializedSize (void) const
{
// two bytes of data to store
return m_data.size() * 2;
}
void
MyHeader::Serialize (Buffer::Iterator start) const
{
start.WriteHtonU32(GetSerializedSize());
for (std::vector<uint16_t>::const_iterator i = m_data.begin(); i != m_data.end(); i++)
{
start.WriteHtonU16 (*i);
}
}
uint32_t
MyHeader::Deserialize (Buffer::Iterator start)
{
uint32_t len = start.ReadNtohU32 ();
for (uint32_t i = 0; i < len; i++) {
m_data.append(start.ReadNtohU16())
}
return 4+len*2;
}
I have a message class that I decided to use the builder design pattern. Each message, when completely built, looks very similar. I use a std::string to hold the information (its actually just independent chars, so I could have used vector<char> but the .c_str() was convenient.
The method of construction of each different subtype of message is the same (build header, build cargo, build footer, calc checksum... this is defined in the MessageBuilder class (and inherited to custom message builder classes):
class MessageBuilder
{
public:
// implementation details for all messages
static const int32 MsgDelimeter = 99;
// ...
Message getMsg();
void buildMessage();
protected:
MessageBuilder(uint8 msgID, uint8 cargoLen, uint8 csi, const uint8* cargo, const uint8 length)
: m_msgID(msgID), m_cargoLen(cargoLen), m_csi(csi),
m_cargo(cargo), m_contents(""), m_msg(m_contents)
{ }
// I previously tried passing cargo and length as just a std::string
// if I could get that to work it would also be fine
void buildHeader();
void buildCargo();
void buildFooter();
void resizeContents();
void calculateCheckSum();
// message is built in m_contents
Message::string m_contents;
Message::string m_cargo;
Message m_msg;
// variables specific to msg type
uint8 m_msgID;
uint8 m_cargoLen;
uint8 m_csi;
private:
};
Then to build a specific message, I have a specific class:
class CustomMessageABuilder : public MessageBuilder
{
public:
static const uint8 CustomMessageAID = 187;
// more details
// ...
// what I want to do
// static const uint8 CustomMessageACargo[4] = {0x65, 0xc7, 0xb4, 0x45};
// ** HERE **
CustomMessageABuilder ()
: MessageBuilder(CustomMessageAID,
CustomMessage1CargoLen,
//...
CustomMessageACargo,
CustomMessageALength
{ }
};
Anyway, what I want to do is pass the only custom string of characters, the cargo, from the CustomMessageABuilder constructor to the MessageBuilder class, where it will be stored in the middle of the message.
The cargo is different for each message, but gets stored in the same way, so all the logic for storing it/creating the cargo is in the base MessageBuilder class. All the differences, like msgID, cargoLen, cargo, ... are constants in the CustomMessageBuilder classes.
This would allow me to keep my message class really simple:
class Message
{
public:
typedef std::string string;
// ctor
Message(string);
// dtor
~Message();
// copy ctor
Message(const Message&);
// assignment operator
Message& operator=(const Message&);
// getters
uint8 getLength() const;
const string& getData() const;
const uint8* getCSTR() const;
// setters
void setData(const string&);
protected:
// ctor
Message() : m_contents("") { }
// contents of entire message
string m_contents;
};
So I guess it all boils down to this:
What is the best way to define a constant array of characters/hex values (each message cargo) for a class, (and still be able to pass it in the initialization list of the constructor)? Or, tell me the obvious way to do this that I am missing.
Note: For other message classes, the cargo will be dynamic content, but always fixed length.
Note2: I will eventually have a director class which will own a CustomMessageBuilder() and tell it to buildMessage().
Any help, advice, criticism etc would be much appreciated.
Static const members can be initialized outside of the class.
#include <iostream>
class A
{
public:
static const char cargo[4];
};
const char A::cargo[4] = {0x65, 0xc7, 0xb4, 0x45};
int main()
{
std::cout << A::cargo[0] << A::cargo[1] << A::cargo[2] << A::cargo[3] << std::endl;
}