Don't break backwards-compatibility messages in a DDS system - c++

I'm involved in a project which uses DDS as a protocol to communicate and C++ as language. As you know exchanged messages are called Topics. Well, sometimes a team must change a topic definition, as a consequence of this, other software which depends on this topic stops working, it is necessary to update the topic everywhere and recompile. So, my question is, do you know how not to break backwards compatiblity? I have been searching and I have found google protocol buffer, they say this:
"You can add new fields to your message formats without breaking
backwards-compatibility; old binaries simply ignore the new field when
parsing. So if you have a communications protocol that uses protocol
buffers as its data format, you can extend your protocol without
having to worry about breaking existing code."
Any other idea?
Thanks in advance.

The OMG specification Extensible and Dynamic Topic Types for DDS (or DDS-XTypes for short) addresses your problem. Quoted from that specification:
The Type System supports type evolution so that it is possible to
“evolve the type” as described above and retain interoperability
between components that use different versions of the type
Not all DDS implementations currently support XTypes, so you might have to resort to a different solution. For example, you could include a version numbering scheme in your Topic name to avoid typing conflicts between different components. In order to make sure every component receives the right data it needs, you can create a service that has the responsibility to forward between different versions of the Topics as needed. This service has to be aware of the different versions of the Topics and should take care of filling default values and/or transform between their different types. Whether or not this is a viable solution depends, among other things, on your system requirements.
The use of a different type system like Protocol Buffers inside DDS is not recommended if it is just for the purpose of solving the type evolution problem. You would essentially be transporting your PB messages as data opaque to the DDS middleware. This means that you would also lose some of the nice tooling features like dynamic discovery and display of types, because the PB messages are not understood by the DDS middleware. Also, your applications would become more complex because they would be responsible for invoking the right PB de/serialization methods. It is easier to let DDS take care of all that.
Whichever route you take, it is recommended that you manage your data model evolution tightly. If you let just anybody add or remove some attributes to their liking, the situation will become difficult to maintain quickly.

The level of support for protocol buffers with DDS depends on how its implemented. With PrismTech's Vortex for instance, you don't lose content-awareness, nor dynamic discovery nor display of types as both the middleware and its tools DO understand the PB messages. W.r.t. 'populating' your PB-based topic, thats conform the PB-standard and generated transparently by the protoc compiler, so one could state that if you're familiar with protobuf (and perhaps not with OMG's IDL alternative), then you can really benefit from a proper integration of DDS and GPB that retains the global-dataspace advantages yet i.c.w. a well-known/popular type-system (GPB)

You could try wrapping the DDS layer in your own communication layer.
Define a set of DDS Types and DDS Profiles that covers all your needs.
Then each topic will be define as one of those types associated to one of those profiles.
For example you could have a type for strings, and a binary type.
If you use the same compiler for all your software, you can even safely memcopy a C struct to the binary type.
module atsevents {
valuetype DataType {
public string<128> mDataId;
public boolean mCustomField1;
public boolean mCustomField2;
};
valuetype StringDataType : DataType {
public string<128> mDataKey; //#key
public string<1024> mData;
};
valuetype BinaryDataType : DataType {
public string<128> mDataKey; //#key
public sequence<octet, 1024> mData;
public unsigned long mTypeHash;
}
}

Related

Datastructure Storage in Filesystem

I am trying to write a persistent datastructure in C++ , however I feel that I should be able to make it binary compatible with various other implementations of my datastructure readers, and hence, my current idea is to declare datastructure in the native memory without any abstraction.
For example, I would specify a linear block of memory as a datastructure (using new keyword) and then describe what the first byte means, what the second byte means and so on. I know I can do this using struct but then, the datastructure would be bound to one language and other languages will have to then use this structure. Also, the implementation might then change from compiler to compiler. I would instead like it as a memory standard.
Is what I am trying to do somewhat sensible? Or I am trying to over-simplify things and should really proceed with a struct data structure? Now onto the C++ part, if you believe that I should be using a struct data structure, then what are the disadvantages of using a full-fledged class?
(I am using a class anyway to wrap around the memory structure and provide functions to it since the datastructure is anyway persistent.)
EDIT
As justin as suggested, I do not need any such advanced interface wrapper around the memory structure, so my last point about class wrapper is not stated properly. What I mean is I would like to have a class interface for the memory representation, it does not necessarily have to be a wrapper.
Several file formats I have read/worked with do exactly that -- define a memory standard or layout, then typically back it up with a demonstration in C-like pseudo-structure. Sometimes they will provide struct or class representations, and some are completely abstracted by a library. Of course, these formats go on to document all fields, their sizes, the endianness of the data and so on.
I figure endian related issues, padding, complexity (e.g. introduced by variations in the data structures) and proper versioning are the biggest sources of errors. Another issue I find is the use of data structures of yesteryear and inconsistency of data structures used to represent similar functionalities -- You may receive a spec, and realize it contains several different string representations -- all of which are archaic, and somebody has to go on to support all of these (bidirectionally).
Proceeding that route:
You should not commit to a binary representation (or compilable program) if you don't want to support it (and attempts of long-lived formats fail/stumble along the way, as platforms and toolsets change). Just commit to a formal memory standard at first, then build on top of that with tests and example input files to verify the representation is properly serialized and deserialized correctly. A very basic test suite will help ensure your model is portable on all the systems you need, and can point out potential pitfalls or platform specific considerations you may not have been aware of.
If you really want to provide a compilable representation, I'd stick with a very compliant struct representation -- clients can take that (in memory) representation and turn it into any C++ abstraction/representation they like. That is to say, a serialized representation should probably not reflect that of a representation in memory, apart from trivially simple representations and the intermediate storage of such a representation (flattened and packed structs).
One of the important parts is that you should have tests which confirm your in memory object graph which you create with these structs are forward and backwards serializable and de-serializable, and support proper versioning -- so it often takes a bit of work to make a complex serialized representation compatible. So you see this approach just introduces one abstraction layer on top of another. In this regard, you may want give C++ abstraction the ability to create itself from the packed in memory representation, and to ensure that that representation can also correctly populate the packed structure without data loss.
Beyond that, is there any need to have a more advanced interface? If there is, then you may want to provide that information.
So yes, the memory standard is the part that you must get correct and stable, and to which all implementations should refer to and test against -- regardless of platform/architecture differences. IOW, you're on the right track ;)
In C++ there's no practical difference between struct and class (besides the default accessibility being public in struct). Traditionally, struct is used when a type only has (public) member variables and no member functions but this is only a convention, not a rule enforced by the compiler.
I'd certainly use a struct/class to describe the data. If someone wants to write a reader of your data structure, they can either import your header file or implement the data structure in their language of choice - in most programming languages this should be pretty simple.
I recommend you start your structure something like this:
typedef struct
{
int Version; // struct layout version
int ByteSize; // byte size of structure for validation
...
} MYDATA;
This way when your data structure is being passed around, your code can verify that the allocated structure size matches with how many bytes you'd expect for a given version of your structure. You could then easily introduce new versions of your structure by simply updating the version field and checking for the new size.
When you save your data to disk, make sure that you write it out field-by-field, rather than through a single write (using a pointer and sizeof() to ensure that other languages won't have to deal with potential padding that your C++ compiler may decide to put in. It's possible to manually lay out fields in the structure so that there's no padding but you have to be very, very careful while doing that and it's easy to make mistakes.

Module and classes handling (dynamic linking)

Run into a bit of an issue, and I'm looking for the best solution concept/theory.
I have a system that needs to use objects. Each object that the system uses has a known interface, likely implemented as an abstract class. The interfaces are known at build time, and will not change. The exact implementation to be used will vary and I have no idea ahead of time what module will be providing it. The only guarantee is that they will provide the interface. The class name and module (DLL) come from a config file or may be changed programmatically.
Now, I have all that set up at the moment using a relatively simple system, set up something like so (rewritten pseudo-code, just to show the basics):
struct ClassID
{
Module * module;
int number;
};
class Module
{
HMODULE module;
function<void * (int)> * createfunc;
static Module * Load(String filename);
IObject * CreateClass(int number)
{
return createfunc(number);
}
};
class ModuleManager
{
bool LoadModule(String filename);
IObject * CreateClass(String classname)
{
ClassID class = AvailableClasses.find(classname);
return class.module->CreateObject(class.number);
}
vector<Module*> LoadedModules;
map<String, ClassID> AvailableClasses;
};
Modules have a few exported functions to give the number of classes they provide and the names/IDs of those, which are then stored. All classes derive from IObject, which has a virtual destructor, stores the source module and has some methods to get the class' ID, what interface it implements and such.
The only issue with this is each module has to be manually loaded somewhere (listed in the config file, at the moment). I would like to avoid doing this explicitly (outside of the ModuleManager, inside that I'm not really concerned as to how it's implemented).
I would like to have a similar system without having to handle loading the modules, just create an object and (once it's all set up) it magically appears.
I believe this is similar to what COM is intended to do, in some ways. I looked into the COM system briefly, but it appears to be overkill beyond belief. I only need the classes known within my system and don't need all the other features it handles, just implementations of interfaces coming from somewhere.
My other idea is to use the registry and keep a key with all the known/registered classes and their source modules and numbers, so I can just look them up and it will appear that Manager::CreateClass finds and makes the object magically. This seems like a viable solution, but I'm not sure if it's optimal or if I'm reinventing something.
So, after all that, my question is: How to handle this? Is there an existing technology, if not, how best to set it up myself? Are there any gotchas that I should be looking out for?
COM very likely is what you want. It is very broad but you don't need to use all the functionality. For example, you don't need to require participants to register GUIDs, you can define your own mechanism for creating instances of interfaces. There are a number of templates and other mechanisms to make it easy to create COM interfaces. What's more, since it is a standard, it is easy to document the requirements.
One very important thing to bear in mind is that importing/exporting C++ objects requires all participants to be using the same compiler. If you think that ever could be a problem to you then you should use COM. If you are happy to accept that restriction then you can carry on as you are.
I don't know if any technology exists to do this.
I do know that I worked with a system very similar to this. We used XML files to describe the various classes that different modules made available. Our equivalent of ModuleManager would parse the xml files to determine what to create for the user at run time based on the class name they provided and the configuration of the system. (Requesting an object that implemented interface 'I' could give back any of objects 'A', 'B' or 'C' depending on how the system was configured.)
The big gotcha we found was that the system was very brittle and at times hard to debug/understand. Just reading through the code, it was often near impossible to see what concrete class was being instantiated. We also found that maintaining the XML created more bugs and overhead than expected.
If I was to do this again, I would keep the design pattern of exposing classes from DLL's through interfaces, but I would not try to build a central registry of classes, nor would I derive everything from a base class such as IObject.
I would instead make each module responsible for exposing its own factory functions(s) to instantiate objects.

Object-oriented networking

I've written a number of networking systems and have a good idea of how networking works. However I always end up having a packet receive function which is a giant switch statement. This is beginning to get to me. I'd far rather a nice elegant object-oriented way to handle receiving packets but every time I try to come up with a good solution I always end up coming up short.
For example lets say you have a network server. It is simply waiting there for responses. A packet comes in and the server needs to validate the packet and then it needs to decide how to handle it.
At the moment I have been doing this by switching on the packet id in the header and then having a huge bunch of function calls that handle each packet type. With complicated networking systems this results in a monolithic switch statement and I really don't like handling it this way. One way I've considered is to use a map of handler classes. I can then pass the packet to the relevant class and handle the incoming data. The problem I have with this is that I need some way to "register" each packet handler with the map. This means, generally, I need to create a static copy of the class and then in the constructor register it with the central packet handler. While this works it really seems like an inelegant and fiddly way of handling it.
Edit: Equally it would be ideal to have a nice system that works both ways. ie a class structure that easily handles sending the same packet types as receiving them (through different functions obviously).
Can anyone point me towards a better way to handle incoming packets? Links and useful information are much appreciated!
Apologies if I haven't described my problem well as my inability to describe it well is also the reason I've never managed to come up with a solution.
About the way to handle the packet type: for me the map is the best. However I'd use a plain array (or a vector) instead of a map. It would make access time constant if you enumerate your packet types sequentially from 0.
As to the class structure. There are libraries that already do this job: Available Game network protocol definition languages and code generation. E.g. Google's Protocol Buffer seems to be promising. It generates a storage class with getters, setters, serialization and deserialization routines for every message in the protocol description. The protocol description language looks more or less rich.
A map of handler instances is pretty much the best way to handle it. Nothing inelegant about it.
In my experience, table driven parsing is the most efficient method.
Although std::map is nice, I end up using static tables. The std::map cannot be statically initialized as a constant table. It must be loaded during run-time. Tables (arrays of structures) can be declared as data and initialized at compile time. I have not encountered tables big enough where a linear search was a bottleneck. Usually the table size is small enough that the overhead in a binary search is slower than a linear search.
For high performance, I'll use the message data as an index into the table.
When you are doing OOP, you try to represent every thing as an object, right? So your protocol messages become objects too; you'll probably have a base class YourProtocolMessageBase which will encapsulate any message's behavior and from which you will inherit your polymorphically specialized messages. Then you just need a way to turn every message (i.e. every YourProtocolMessageBase instance) into a string of bytes, and a way to do reverse. Such methods are called serialization techniques; some metaprogramming-based implementations exist.
Quick example in Python:
from socket import *
sock = socket(AF_INET6, SOCK_STREAM)
sock.bind(("localhost", 1234))
rsock, addr = sock.accept()
Server blocks, fire up another instance for a client:
from socket import *
clientsock = socket(AF_INET6, SOCK_STREAM)
clientsock.connect(("localhost", 1234))
Now use Python's built-in serialization module, pickle; client:
import pickle
obj = {1: "test", 2: 138, 3: ("foo", "bar")}
clientsock.send(pickle.dumps(obj))
Server:
>>> import pickle
>>> r = pickle.loads(rsock.recv(1000))
>>> r
{1: 'test', 2: 138, 3: ('foo', 'bar')}
So, as you can see, I just sent over link-local a Python object. Isn't this OOP?
I think the only possible alternative to serializing is maintaining the bimap IDs ⇔ classes. This looks really inevitable.
You want to keep using the same packet network protocol, but translate that into an Object in programming, right ?
There are several protocols that allow you to treat data as programming objects, but it seems, you don't want to change the protocol, just the way its treated in your application.
Does the packets come with something like a "tag" or metadata or any "id" or "data type" that allows you to map to an specific object class ? If it does, you may create an array that stores the id. and the matching class, and generate an object.
A more OO way to handle this is to build a state machine using the state pattern.
Handling incoming raw data is parsing where state machines provide an elegant solution (you will have to choose between elegant and performance)
You have a data buffer to process, each state has a handle buffer method that parses and processes his part of the buffer (if already possible) and sets the next state based on the content.
If you want to go for performance, you still can use a state machine, but leave out the OO part.
I would use Flatbuffers and/or Cap’n Proto code generators.
I solved this problem as part of my btech in network security and network programming and I can assure it's not one giant packet switch statement. The library is called cross platform networking and I modeled it around the OSI model and how to output it as a simple object serialization. The repository is here: https://bitbucket.org/ptroen/crossplatformnetwork/src/master/
Their is a countless protocols like NACK, HTTP, TCP,UDP,RTP,Multicast and they all are invoked via C++ metatemplates. Ok that is the summarized answer now let me dive a bit deeper and explain how you solve this problem and why this library can help you out whether you design it yourself or use the library.
First, let's talk about design patterns in general. To make it nicely organized you need first some design patterns around it as a way to frame your problem. For my C++ templates I framed it initially around the OSI Model(https://en.wikipedia.org/wiki/OSI_model#Layer_7:_Application_layer) down to the transport level(which becomes sockets at that point). To recap OSI :
Application Layer: What it means to the end user. IE signals getting deserialized or serialized and passed down or up from the networking stack
Presentation: Data independence from application and network stack
Session: dialogues between sessions
Transport: transporting the packets
But here's the kicker when you look at these closely these aren't design pattern but more like namespaces around transporting from A to B. So to a end user I designed cross platform network with the following standardized C++ metatemplate
template <class TFlyWeightServerIncoming, // a class representing the servers incoming payload. Note a flyweight is a design pattern that's a union of types ie putting things together. This is where you pack your incoming objects
class TFlyWeightServerOutgoing, // a class representing the servers outgoing payload of different types
class TServerSession, // a hook class that represent how to translate the payload in the form of a session layer translation. Key is to stay true to separation of concerns(https://en.wikipedia.org/wiki/Separation_of_concerns)
class TInitializationParameters> // a class representing initialization of the server(ie ports ,etc..)
two examples: https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Transport/TCP/TCPTransport.h
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Transport/HTTP/HTTPTransport.h
And each protocol can be invoked like this:
OSI::Transport::Interface::ITransportInitializationParameters init_parameters;
const size_t defaultTCPPort = 80;
init_parameters.ParseServerArgs(&(*argv), argc, defaultTCPPort, defaultTCPPort);
OSI::Transport::TCP::TCP_ServerTransport<SampleProtocol::IncomingPayload<OSI::Transport::Interface::ITransportInitializationParameters>, SampleProtocol::OutgoingPayload<OSI::Transport::Interface::ITransportInitializationParameters>, SampleProtocol::SampleProtocolServerSession<OSI::Transport::Interface::ITransportInitializationParameters>, OSI::Transport::Interface::ITransportInitializationParameters> tcpTransport(init_parameters);
tcpTransport.RunServer();
citation:
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Application/Stub/TCPServer/main.cc
I also have in the code base under MVC a full MVC implementation that builds on top of this but let's get back to your question. You mentioned:
"At the moment I have been doing this by switching on the packet id in the header and then having a huge bunch of function calls that handle each packet type."
" With complicated networking systems this results in a monolithic switch statement and I really don't like handling it this way. One way I've considered is to use a map of handler classes. I can then pass the packet to the relevant class and handle the incoming data. The problem I have with this is that I need some way to "register" each packet handler with the map. This means, generally, I need to create a static copy of the class and then in the constructor register it with the central packet handler. While this works it really seems like an inelegant and fiddly way of handling it."
In cross platform network the approach to adding new types is as follows:
After you defined the server type you just need to make the incoming and outgoing types. The actual mechanism for handling them is embedded with in the incoming object type. The methods within it are ToString(), FromString(),size() and max_size(). These deal with the security concerns of keeping the layers below the application layer secure. But since your defining object handlers now you need to make the translation code to different object types. You'll need at minimum within this object:
1.A list of enumerated object types for the application layer. This could be as simple as numbering them. But for things like the session layer have a look at session layer concerns(for instance RTP has things like jitter and how to deal with imperfect connection. IE session concerns). Now you could also switch from enumerated to a hash/map but that's just another way of dealing of the problem how to look up the variable.
Defining Serialize and de serialize the object(for both incoming and outgoing types).
After you serialized or deserialize put the logic to dispatch it to the appropriate internal design pattern to handle the application layer. This could possibly be a builder , or command or strategy it really depends on it's use case. In cross platform some concerns is delegated by the TServerSession layer and others by the incoming and outgoing classes. It just depends on the seperation of concerns.
Deal with performance concerns. IE its not blocking(which becomes a bigger concern when you scale up concurrent user).
Deal with security concerns(pen test).
If you curious you can review my api implementation and it's a single threaded async boost reactor implementation and when you combine with something like mimalloc(to override new delete) you can get very good performance. I measured like 50k connections on a single thread easily.
But yeah it's all about framing your server in good design patterns , separating the concerns and selecting a good model to represent the server design. I believe the OSI model is appropriate for that which is why i put in cross platform network to provide superior object oriented networking.

Design methods for multiple serialization targets/formats (not versions)

Whether as members, whether perhaps static, separate namespaces, via friend-s, via overloads even, or any other C++ language feature...
When facing the problem of supporting multiple/varying formats, maybe protocols or any other kind of targets for your types, what was the most flexible and maintainable approach?
Were there any conventions or clear cut winners?
A brief note why a particular approach helped would be great.
Thanks.
[ ProtoBufs like suggestions should not cut it for an upvote, no matter how flexible that particular impl might be :) ]
Reading through the already posted responses, I can only agree with a middle-tier approach.
Basically, in your original problem you have 2 distinct hierarchies:
n classes
m protocols
The naive use of a Visitor pattern (as much as I like it) will only lead to n*m methods... which is really gross and a gateway towards maintenance nightmare. I suppose you already noted it otherwise you would not ask!
The "obvious" target approach is to go for a n+m solution, where the 2 hierarchies are clearly separated. This of course introduces a middle-tier.
The idea is thus ObjectA -> MiddleTier -> Protocol1.
Basically, that's what Protocol Buffers does, though their problematic is different (from one language to another via a protocol).
It may be quite difficult to work out the middle-tier:
Performance issues: a "translation" phase add some overhead, and here you go from 1 to 2, this can be mitigated though, but you will have to work on it.
Compatibility issues: some protocols do not support recursion for example (xml or json do, edifact does not), so you may have to settle for a least-common approach or to work out ways of emulating such behaviors.
Personally, I would go for "reimplementing" the JSON language (which is extremely simple) into a C++ hierarchy:
int
strings
lists
dictionaries
Applying the Composite pattern to combine them.
Of course, that is the first step only. Now you have a framework, but you don't have your messages.
You should be able to specify a message in term of primitives (and really think about versionning right now, it's too late once you need another version). Note that the two approaches are valid:
In-code specification: your message is composed of primitives / other messages
Using a code generation script: this seems overkill there, but... for the sake of completion I thought I would mention it as I don't know how many messages you really need :)
On to the implementation:
Herb Sutter and Andrei Alexandrescu said in their C++ Coding Standards
Prefer non-member non-friend methods
This applies really well to the MiddleTier -> Protocol step > creates a Protocol1 class and then you can have:
Protocol1 myProtocol;
myProtocol << myMiddleTierMessage;
The use of operator<< for this kind of operation is well-known and very common. Furthermore, it gives you a very flexible approach: not all messages are required to implement all protocols.
The drawback is that it won't work for a dynamic choice of the output protocol. In this case, you might want to use a more flexible approach. After having tried various solutions, I settled for using a Strategy pattern with compile-time registration.
The idea is that I use a Singleton which holds a number of Functor objects. Each object is registered (in this case) for a particular Message - Protocol combination. This works pretty well in this situation.
Finally, for the BOM -> MiddleTier step, I would say that a particular instance of a Message should know how to build itself and should require the necessary objects as part of its constructor.
That of course only works if your messages are quite simple and may only be built from few combination of objects. If not, you might want a relatively empty constructor and various setters, but the first approach is usually sufficient.
Putting it all together.
// 1 - Your BOM
class Foo {};
class Bar {};
// 2 - Message class: GetUp
class GetUp
{
typedef enum {} State;
State m_state;
};
// 3 - Protocl class: SuperProt
class SuperProt: public Protocol
{
};
// 4 - GetUp to SuperProt serializer
class GetUp2SuperProt: public Serializer
{
};
// 5 - Let's use it
Foo foo;
Bar bar;
SuperProt sp;
GetUp getUp = GetUp(foo,bar);
MyMessage2ProtBase.serialize(sp, getUp); // use GetUp2SuperProt inside
If you need many output formats for many classes, I would try to make it a n + m problem instead of an n * m problem. The first way I come to think of is to have the classes reductible to some kind of dictionary, and then have a method to serlalize those dictionarys to each output formats.
Assuming you have full access to the classes that must be serialized. You need to add some form of reflection to the classes (probably including an abstract factory). There are two ways to do this: 1) a common base class or 2) a "traits" struct. Then you can write your encoders/decoders in relation to the base class/traits struct.
Alternatively, you could require that the class provide a function to export itself to a container of boost::any and provide a constructor that takes a boost::any container as its only parameter. It should be simple to write a serialization function to many different formats if your source data is stored in a map of boost::any objects.
That's two ways I might approach this. It would depend highly on the similarity of the classes to be serialized and on the diversity of target formats which of the above methods I would choose.
I used OpenH323 (famous enough for VoIP developers) library for long enough term to build number of application related to VoIP starting from low density answering machine and up to 32xE1 border controller. Of course it had major rework so I knew almost anything about this library that days.
Inside this library was tool (ASNparser) which converted ASN.1 definitions into container classes. Also there was framework which allowed serialization / de-serialization of these containers using higher layer abstractions. Note they are auto-generated. They supported several encoding protocols (BER,PER,XER) for ASN.1 with very complex ASN sntax and good-enough performance.
What was nice?
Auto-generated container classes which were suitable enough for clear logic implementation.
I managed to rework whole container layer under ASN objects hierarchy without almost any modification for upper layers.
It was relatively easy to do refactoring (performance) for debug features of that ASN classes (I understand, authors didn't intended to expect 20xE1 calls signalling to be logged online).
What was not suitable?
Non-STL library with lazy copy under this. Refactored by speed but I'd like to have STL compatibility there (at least that time).
You can find Wiki page of all the project here. You should focus only on PTlib component, ASN parser sources, ASN classes hierarchy / encoding / decoding policies hierarchy.
By the way,look around "Bridge" design pattern, it might be useful.
Feel free to comment questions if something seen to be strange / not enough / not that you requested actuall.

How to handle different protocol versions transparently in c++?

This is a generic C++ design question.
I'm writing an application that uses a client/server model. Right now I'm writing the server-side. Many clients already exist (some written by myself, others by third parties). The problem is that these existing clients all use different protocol versions (there have been 2-3 protocol changes over the years).
Since I'm re-writing the server, I thought it would be a great time to design my code such that I can handle many different protocol versions transparently. In all protocol versions, the very first communication from the client contains the protocol version, so for every client connection, the server knows exactly which protocol it needs to talk.
The naive method to do this is to litter the code with statements like this:
if (clientProtocolVersion == 1)
// do something here
else if (clientProtocolVersion == 2)
// do something else here
else if (clientProtocolVersion == 3)
// do a third thing here...
This solution is pretty poor, for the following reasons:
When I add a new protocol version, I have to find everywhere in the source tree that these if statements are used, and modify them to add the new functionality.
If a new protocol version comes along, and some parts of the protocol version are the same as another version, I need to modify the if statements so they read if (clientProtoVersion == 5 || clientProtoVersion == 6).
I'm sure there are more reasons why it's bad design, but I can't think of them right now.
What I'm looking for is a way to handle different protocols intelligently, using the features of the C++ langauge. I've thought about using template classes, possibly with the template parameter specifying the protocol version, or maybe a class heirarchy, one class for each different protocol version...
I'm sure this is a very common design pattern, so many many people must have had this problem before.
Edit:
Many of you have suggested an inheritance heirarchy, with the oldest protocol version at the top, like this (please excuse my ASCII art):
IProtocol
^
|
CProtoVersion1
^
|
CProtoVersion2
^
|
CProtoVersion3
... This seems like a sensible thing to do, in terms of resuse. However, what happens when you need to extend the protocol and add fundamentally new message types? If I add virtual methods in IProtocol, and implement these new methods in CProtocolVersion4, how are these new methods treated in earlier protocol versions? I guess my options are:
Make the default implementation a NO_OP (or possibly log a message somewhere).
Throw an exception, although this seems like a bad idea, even as I'm typing it.
... do something else?
Edit2:
Further to the above issues, what happens when a newer protocol message requires more input than an older version? For example:
in protocl version 1, I might have:
ByteArray getFooMessage(string param1, int param2)
And in protocol version 2 I might want:
ByteArray getFooMessage(string param1, int param2, float param3)
The two different protocol versions now have different method signatures, which is fine, except that it forces me to go through all calling code and change all calls with 2 params to 3 params, depending on the protocol version being used, which is what I'm trying to avoid in the first place!
What is the best way of separating protocol version information from the rest of your code, such that the specifics of the current protocol are hidden from you?
Since you need to dynamically choose which protocol to use, using different classes (rather than a template parameter) for selecting the protocol version seems like the right way to go. Essentially this is Strategy Pattern, though Visitor would also be a possibility if you wanted to get really elaborate.
Since these are all different versions of the same protocol, you could probably have common stuff in the base class, and then the differences in the sub classes. Another approach might be to have the base class be for the oldest version of the protocol and then have each subsequent version have a class that inherits from the previous version. This is a somewhat unusual inheritance tree, but the nice thing about it is that it guarantees that changes made for later versions don't affect older versions. (I'm assuming the classes for older versions of the protocol will stabilize pretty quickly and then rarely ever change.
However you decide to organize the hierarchy, you'd then want to chose the protocol version object as soon as you know the protocol version, and then pass that around to your various things that need to "talk" the protocol.
I have used (and heard of others using) templates to solve this problem too. The idea is that you break your different protocols up into basic atomic operations and then use something like boost::fusion::vector to build protocols out of the individual blocks.
The following is an extremely rough (lots of pieces missing) example:
// These are the kind of atomic operations that we can do:
struct read_string { /* ... */ };
struct write_string { /* ... */ };
struct read_int { /* ... */ };
struct write_int { /* ... */ };
// These are the different protocol versions
typedef vector<read_string, write_int> Protocol1;
typedef vector<read_int, read_string, write_int> Protocol2;
typedef vector<read_int, write_int, read_string, write_int> Protocol3;
// This type *does* the work for a given atomic operation
struct DoAtomicOp {
void operator ()(read_string & rs) const { ... }
void operator ()(write_string & ws) const { ... }
void operator ()(read_int & ri) const { ... }
void operator ()(write_int & wi) const { ... }
};
template <typename Protocol> void doProtWork ( ... ) {
Protocl prot;
for_each (prot, DoAtomicOp (...));
}
Because the protocl version is dynamic you'll need a single top level switch statement to determine which protocl to use.
void doWork (int protocol ...) {
switch (protocol) {
case PROTOCOL_1:
doProtWork<Protocol1> (...);
break;
case PROTOCOL_2:
doProtWork<Protocol2> (...);
break;
case PROTOCOL_3:
doProtWork<Protocol3> (...);
break;
};
}
To add a new protocol (that uses existing types) you only need to do define the protocl sequence:
typedef vector<read_int, write_int, write_int, write_int> Protocol4;
And then add a new entry to the switch statement.
I'd tend towards to using different classes to implement adapters for the different protocols to the same interface.
Depending on the protocol and the differences, you might get some gain using TMP for state machines or protocol details, but generating six sets of whatever code uses the six protocol versions is probably not worth it; runtime polymorphism is sufficient, and for most cases TCP IO is probably slow enough not to want to hard code everything.
maybe oversimplified, but this sounds like a job for inheritance?
A base class IProtocol which defines what a protocol does (and possibly some common methods), and then one implementation for IProtocol for each protocol you have?
You need a protocol factory that returns a protocol handler for the appropraite version:
ProtocolHandler& handler = ProtocolFactory::getProtocolHandler(clientProtocolVersion);
You can then use inheritance to only update the parts of the protocol that have been changed between versions.
Notice that the ProtocolHandler (base class) is basically a stratergy pattern. As such it should not maitain its own state (if required that is passed in via the methods and the stratergy will update the property of the state object).
Becuase the stratergy does not maintain state we can share the ProtcolHandler between any number of threads and as such the ownership does not need to leave the factory object. Thus the factory just need to create one handler object for each protocol version it understands (this can even be done lazily). Becuase the factory object retains ownership you can return a reference of the Protocol Handler.
I'm gonna agree with Pete Kirkham, I think it would be pretty unpleasant to maintain potentially 6 different versions of the classes to support the different protocol versions. If you can do it, it seems like it would be better to have the older versions just implement adapters to translate to the latest protocol so you only have one working protocol to maintain. You could still use an inheritance hierarchy like shown above, but the older protocol implementations just do an adaption then call the newest protocol.