It was tough to summarize the problem in the title; so allow me to clarify the situation here.
I have a class that I'm designing that represents a BER TLV structure. In this specification, the "data" portion of the TLV can contain raw bytes of data OR other nested TLVs. To support both forms, I use the same structure but with two vectors (only one will actually contain something, depending on what we find as we parse the TLV data):
class BerTlv
{
public:
void Parse(std::vector<std::uint8_t> const& bytes_to_parse);
// Assume relevant accessors are provided
private:
// Will be m_data or m_nestedTlvs, but never both
std::vector<std::uint8_t> m_data;
std::vector<BerTlv> m_nestedTlvs;
};
From the outside, after this object is fully constructed (all TLV data parsed), the user will need to detect what kind of data they are dealing with. Basically, they'd have to check m_data.empty(), and if so, use m_nestedTlvs. I'm not really happy with this approach; it smells like it lacks a better design.
I thought of some form of a union, although I do not think a real union would be appropriate here since vector data is heap allocated. So I thought of std::variant:
std::vector<std::variant<BerTlv, std::uint8_t>> m_data;
However, I'm worried this negatively impacts the std::uint8_t case since that's literally just byte data. It will now become non-continuous as well. The variant only benefits the nested TLV case and not by much.
Next I considered using the visitor pattern here, but I can't quite visualize what the interface would look like or how this would improve usability in both cases (raw data vs nested TLVs). Is visitor the right solution here?
Nothing I've thought of so far feels right, so I'm hoping for feedback on a better design approach to this problem. The general problem here is having data members that are sometimes unused or are mutually exclusive. It's a problem I run into in other contexts as well, so it would be great to have a general design approach to such a problem.
Note that I have access to C++14 features and below.
Basically, they'd have to check m_data.empty(), and if so, use m_nestedTlvs.
If the idea is that an object either has an array of bytes or has an array of other objects, then that's the variant you ought to use: variant<vector<std::uint8_t>, vector<BerTlv>>. A vector of variants does not match your specified use case.
Related
I am currently struggling with the design of an application of the visualization and manipulation of sensor data. I have a database that contains several STL-containers with measured data in it.
std::unordered_map<std::string, std::array<uint16_t, 3366>> data1;
std::unordered_map<std::string, QImage> data2;
std::unordered_map<std::string, std::vector<Point3D>> data3;
I also have different Views (mostly Qt-based) and each view should be associated with a model which is specific to one of the data sets. So data1 is supposed to be processed and manipulated in a class called model1 which is then displayed by means of a class view1 and so forth.
But I cant seem to find a suitable design structure to incorporate this idea. the models grant access to their processed data, but that data is contained in different container structures as given above. That makes it unfeasible to use inheritance with a pure virtual function in the base class like
std::map<...,...> getModelData() = 0;
The initial idea of this inheritance was to avoid code duplication but that doesnt seem to be the right solution here. I know that Qt in their "Model-View" concepts makes use of their QVariant class to have maximum flexibility in terms of types being returned. However, I am wondering, what is the best solution with standard C++ here? I read a lot about striving for loose-coupling, code reuseability, Dependendy Inversion and how to favour composition over inheritance but I do have problems putting these theoretical advise into practice and end up with code bloat and repetitive code most of the times. Can you help me?
Maybe you can give more code but so far, I can give a few hints:
-Can you use QMap instead of std::unordered_map ? It is more agile if you need to tangle with a UI
-Maybe make your second argument of the list a common base type like (code not tested, treat as pseudo code)
class BaseDataClass
{
public:
int getType();
QImage* getImageData();
std::array<uint16_t, 3366>>& getArray();
std::vector<Point3D>>& getVector();
private:
int mType;
BaseDataClass(); //hide ctor or make abstract, as you wish
}
You can avoid code duplicating with this. Make three new classes that each inherit from BaseDataClass. You can then make a method that iterates over all BaseDataClass, checks the type (e.g. 1=QImage; 2 = array ; 3 = vector), and exectues the right method according to the type (get QImage from all type 1`s ...). You also can cast the pointer to the right type then. Which makes it even better if your derived classes gain more and more functionality (like sorting or validating data)
I have tried to read carefully all the advice given in the C++FAQ on this subject. I have implemented my system according to item 36.8 and now after few months (with a lot of data serialized), I want to make changes in both public interface of some of the classes and the inheritance structure itself.
class Base
{
public:
Vector field1() const;
Vector field2() const;
Vector field3() const;
std::string name() const {return "Base";}
};
class Derived : public Base
{
public:
std::string name() const {return "Derived";}
};
I would like to know how to make changes such as:
Split Derived into Derived1 and Derived2, while mapping the original Derived into Derived1 for existing data.
Split Base::field1() into Base::field1a() and Base::field1b() while mapping field1 to field1a and having field1b empty for existing data.
I will have to
deserialize all the gigabytes of my old data
convert them to the new inheritance structure
reserialize them in a new and more flexible way.
I would like to know how to make the serialization more flexible, so that when I decide to make some change in the future, I would not be facing conversion hell like now.
I thought of making a system that would use numbers instead of names to serialize my objects. That is for example Base = 1, Derived1 = 2, ... and a separate number-to-name system that would convert numbers to names, so that when I want to change the name of some class, I would do it only in this separate number-to-name system, without changing the data.
The problems with this approach are:
The system would be brittle. That is changing anything in the number-to-name system would possibly change the meaning of gigabytes of data.
The serialized data would lose some of its human readability, since in the serialized data, there would be numbers instead of names.
I am sorry for putting so many issues into one question, but I am inexperienced at programming and the problem I am facing seems so overwhelming that I just do not know where to start.
Any general materials, tutorials, idioms or literature on flexible serialization is most welcomed!
It's probably a bit late for that now, but whenever designing
a serialization format, you should provide for versionning.
This can be mangled into the type information in the stream, or
treated as a separate (integer) field. When writing the class
out, you always write the latest version. When reading, you
have to read both the type and the version before you can
construct; if you're using the static map suggested in the FAQ,
then the key would be:
struct DeserializeKey
{
std::string type;
int version;
};
Given the situation you are in now, the solution is probably to
mangle the version into the type name in a clearly recognizable
way, say something along the lines of
type_name__version; if the
type_name isn't followed by two underscore,
then use 0. This isn't the most efficient method, but it's
usually acceptable, and will solve the problem with backwards
compatibility, while providing for evolution in the future.
For your precise questions:
In this case, Derived is just a previous version of
Derived1. You can insert the necessary factory function into
the map under the appropriate key.
This is just classical versionning. Version 0 of Base has
a field1 attribute, and when you deserialize, you use it to
initialize field1a, and you initialize field1b empty.
Version 2 of Base has both.
If you mangle the version into the type name, as I suggest
above, you shouldn't have to convert any existing data. Long
term, of course, either some of the older versions simply
disappear from your data sets, so that you can remove the
support for them, or your program keeps getting bigger, with
support for lots of older versions. In practice, I've usually
seen the latter.
maybe Thrift Can help you do it.
I am trying to write a persistent datastructure in C++ , however I feel that I should be able to make it binary compatible with various other implementations of my datastructure readers, and hence, my current idea is to declare datastructure in the native memory without any abstraction.
For example, I would specify a linear block of memory as a datastructure (using new keyword) and then describe what the first byte means, what the second byte means and so on. I know I can do this using struct but then, the datastructure would be bound to one language and other languages will have to then use this structure. Also, the implementation might then change from compiler to compiler. I would instead like it as a memory standard.
Is what I am trying to do somewhat sensible? Or I am trying to over-simplify things and should really proceed with a struct data structure? Now onto the C++ part, if you believe that I should be using a struct data structure, then what are the disadvantages of using a full-fledged class?
(I am using a class anyway to wrap around the memory structure and provide functions to it since the datastructure is anyway persistent.)
EDIT
As justin as suggested, I do not need any such advanced interface wrapper around the memory structure, so my last point about class wrapper is not stated properly. What I mean is I would like to have a class interface for the memory representation, it does not necessarily have to be a wrapper.
Several file formats I have read/worked with do exactly that -- define a memory standard or layout, then typically back it up with a demonstration in C-like pseudo-structure. Sometimes they will provide struct or class representations, and some are completely abstracted by a library. Of course, these formats go on to document all fields, their sizes, the endianness of the data and so on.
I figure endian related issues, padding, complexity (e.g. introduced by variations in the data structures) and proper versioning are the biggest sources of errors. Another issue I find is the use of data structures of yesteryear and inconsistency of data structures used to represent similar functionalities -- You may receive a spec, and realize it contains several different string representations -- all of which are archaic, and somebody has to go on to support all of these (bidirectionally).
Proceeding that route:
You should not commit to a binary representation (or compilable program) if you don't want to support it (and attempts of long-lived formats fail/stumble along the way, as platforms and toolsets change). Just commit to a formal memory standard at first, then build on top of that with tests and example input files to verify the representation is properly serialized and deserialized correctly. A very basic test suite will help ensure your model is portable on all the systems you need, and can point out potential pitfalls or platform specific considerations you may not have been aware of.
If you really want to provide a compilable representation, I'd stick with a very compliant struct representation -- clients can take that (in memory) representation and turn it into any C++ abstraction/representation they like. That is to say, a serialized representation should probably not reflect that of a representation in memory, apart from trivially simple representations and the intermediate storage of such a representation (flattened and packed structs).
One of the important parts is that you should have tests which confirm your in memory object graph which you create with these structs are forward and backwards serializable and de-serializable, and support proper versioning -- so it often takes a bit of work to make a complex serialized representation compatible. So you see this approach just introduces one abstraction layer on top of another. In this regard, you may want give C++ abstraction the ability to create itself from the packed in memory representation, and to ensure that that representation can also correctly populate the packed structure without data loss.
Beyond that, is there any need to have a more advanced interface? If there is, then you may want to provide that information.
So yes, the memory standard is the part that you must get correct and stable, and to which all implementations should refer to and test against -- regardless of platform/architecture differences. IOW, you're on the right track ;)
In C++ there's no practical difference between struct and class (besides the default accessibility being public in struct). Traditionally, struct is used when a type only has (public) member variables and no member functions but this is only a convention, not a rule enforced by the compiler.
I'd certainly use a struct/class to describe the data. If someone wants to write a reader of your data structure, they can either import your header file or implement the data structure in their language of choice - in most programming languages this should be pretty simple.
I recommend you start your structure something like this:
typedef struct
{
int Version; // struct layout version
int ByteSize; // byte size of structure for validation
...
} MYDATA;
This way when your data structure is being passed around, your code can verify that the allocated structure size matches with how many bytes you'd expect for a given version of your structure. You could then easily introduce new versions of your structure by simply updating the version field and checking for the new size.
When you save your data to disk, make sure that you write it out field-by-field, rather than through a single write (using a pointer and sizeof() to ensure that other languages won't have to deal with potential padding that your C++ compiler may decide to put in. It's possible to manually lay out fields in the structure so that there's no padding but you have to be very, very careful while doing that and it's easy to make mistakes.
I have about 15~20 member variables which needs to be accessed, I was wondering
if it would be good just to let them be public instead of giving every one of them
get/set functions.
The code would be something like
class A { // a singleton class
public:
static A* get();
B x, y, z;
// ... a lot of other object that should only have one copy
// and doesn't change often
private:
A();
virtual ~A();
static A* a;
};
I have also thought about putting the variables into an array, but I don't
know the best way to do a lookup table, would it be better to put them in an array?
EDIT:
Is there a better way than Singleton class to put them in a collection
The C++ world isn't quite as hung up on "everything must be hidden behind accessors/mutators/whatever-they-decide-to-call-them-todays" as some OO-supporting languages.
With that said, it's a bit hard to say what the best approach is, given your limited description.
If your class is simply a 'bag of data' for some other process, than using a struct instead of a class (the only difference is that all members default to public) can be appropriate.
If the class actually does something, however, you might find it more appropriate to group your get/set routines together by function/aspect or interface.
As I mentioned, it's a bit hard to tell without more information.
EDIT: Singleton classes are not smelly code in and of themselves, but you do need to be a bit careful with them. If a singleton is taking care of preference data or something similar, it only makes sense to make individual accessors for each data element.
If, on the other hand, you're storing generic input data in a singleton, it might be time to rethink the design.
You could place them in a POD structure and provide access to an object of that type :
struct VariablesHolder
{
int a;
float b;
char c[20];
};
class A
{
public:
A() : vh()
{
}
VariablesHolder& Access()
{
return vh;
}
const VariablesHolder& Get() const
{
return vh;
}
private:
VariablesHolder vh;
};
No that wouldn't be good. Image you want to change the way they are accessed in the future. For example remove one member variable and let the get/set functions compute its value.
It really depends on why you want to give access to them, how likely they are to change, how much code uses them, how problematic having to rewrite or recompile that code is, how fast access needs to be, whether you need/want virtual access, what's more convenient and intuitive in the using code etc.. Wanting to give access to so many things may be a sign of poor design, or it may be 100% appropriate. Using get/set functions has much more potential benefit for volatile (unstable / possibly subject to frequent tweaks) low-level code that could be used by a large number of client apps.
Given your edit, an array makes sense if your client is likely to want to access the values in a loop, or a numeric index is inherently meaningful. For example, if they're chronologically ordered data samples, an index sounds good. Summarily, arrays make it easier to provide algorithms to work with any or all of the indices - you have to consider whether that's useful to your clients; if not, try to avoid it as it may make it easier to mistakenly access the wrong values, particularly if say two people branch some code, add an extra value at the end, then try to merge their changes. Sometimes it makes sense to provide arrays and named access, or an enum with meaningful names for indices.
This is a horrible design choice, as it allows any component to modify any of these variables. Furthermore, since access to these variables is done directly, you have no way to impose any invariant on the values, and if suddenly you decide to multithread your program, you won't have a single set of functions that need to be mutex-protected, but rather you will have to go off and find every single use of every single data member and individually lock those usages. In general, one should:
Not use singletons or global variables; they introduce subtle, implicit dependencies between components that allow seemingly independent components to interfere with each other.
Make variables const wherever possible and provide setters only where absolutely required.
Never make variables public (unless you are creating a POD struct, and even then, it is best to create POD structs only as an internal implementation detail and not expose them in the API).
Also, you mentioned that you need to use an array. You can use vector<B> or vector<B*> to create a dynamically-sized array of objects of type B or type B*. Rather than using A::getA() to access your singleton instance; it would be better to have functions that need type A to take a parameter of type const A&. This will make the dependency explicit, and it will also limit which functions can modify the members of that class (pass A* or A& to functions that need to mutate it).
As a convention, if you want a data structure to hold several public fields (plain old data), I would suggest using a struct (and use in tandem with other classes -- builder, flyweight, memento, and other design patterns).
Classes generally mean that you're defining an encapsulated data type, so the OOP rule is to hide data members.
In terms of efficiency, modern compilers optimize away calls to accessors/mutators, so the impact on performance would be non-existent.
In terms of extensibility, methods are definitely a win because derived classes would be able to override these (if virtual). Another benefit is that logic to check/observe/notify data can be added if data is accessed via member functions.
Public members in a base class is generally a difficult to keep track of.
Whether as members, whether perhaps static, separate namespaces, via friend-s, via overloads even, or any other C++ language feature...
When facing the problem of supporting multiple/varying formats, maybe protocols or any other kind of targets for your types, what was the most flexible and maintainable approach?
Were there any conventions or clear cut winners?
A brief note why a particular approach helped would be great.
Thanks.
[ ProtoBufs like suggestions should not cut it for an upvote, no matter how flexible that particular impl might be :) ]
Reading through the already posted responses, I can only agree with a middle-tier approach.
Basically, in your original problem you have 2 distinct hierarchies:
n classes
m protocols
The naive use of a Visitor pattern (as much as I like it) will only lead to n*m methods... which is really gross and a gateway towards maintenance nightmare. I suppose you already noted it otherwise you would not ask!
The "obvious" target approach is to go for a n+m solution, where the 2 hierarchies are clearly separated. This of course introduces a middle-tier.
The idea is thus ObjectA -> MiddleTier -> Protocol1.
Basically, that's what Protocol Buffers does, though their problematic is different (from one language to another via a protocol).
It may be quite difficult to work out the middle-tier:
Performance issues: a "translation" phase add some overhead, and here you go from 1 to 2, this can be mitigated though, but you will have to work on it.
Compatibility issues: some protocols do not support recursion for example (xml or json do, edifact does not), so you may have to settle for a least-common approach or to work out ways of emulating such behaviors.
Personally, I would go for "reimplementing" the JSON language (which is extremely simple) into a C++ hierarchy:
int
strings
lists
dictionaries
Applying the Composite pattern to combine them.
Of course, that is the first step only. Now you have a framework, but you don't have your messages.
You should be able to specify a message in term of primitives (and really think about versionning right now, it's too late once you need another version). Note that the two approaches are valid:
In-code specification: your message is composed of primitives / other messages
Using a code generation script: this seems overkill there, but... for the sake of completion I thought I would mention it as I don't know how many messages you really need :)
On to the implementation:
Herb Sutter and Andrei Alexandrescu said in their C++ Coding Standards
Prefer non-member non-friend methods
This applies really well to the MiddleTier -> Protocol step > creates a Protocol1 class and then you can have:
Protocol1 myProtocol;
myProtocol << myMiddleTierMessage;
The use of operator<< for this kind of operation is well-known and very common. Furthermore, it gives you a very flexible approach: not all messages are required to implement all protocols.
The drawback is that it won't work for a dynamic choice of the output protocol. In this case, you might want to use a more flexible approach. After having tried various solutions, I settled for using a Strategy pattern with compile-time registration.
The idea is that I use a Singleton which holds a number of Functor objects. Each object is registered (in this case) for a particular Message - Protocol combination. This works pretty well in this situation.
Finally, for the BOM -> MiddleTier step, I would say that a particular instance of a Message should know how to build itself and should require the necessary objects as part of its constructor.
That of course only works if your messages are quite simple and may only be built from few combination of objects. If not, you might want a relatively empty constructor and various setters, but the first approach is usually sufficient.
Putting it all together.
// 1 - Your BOM
class Foo {};
class Bar {};
// 2 - Message class: GetUp
class GetUp
{
typedef enum {} State;
State m_state;
};
// 3 - Protocl class: SuperProt
class SuperProt: public Protocol
{
};
// 4 - GetUp to SuperProt serializer
class GetUp2SuperProt: public Serializer
{
};
// 5 - Let's use it
Foo foo;
Bar bar;
SuperProt sp;
GetUp getUp = GetUp(foo,bar);
MyMessage2ProtBase.serialize(sp, getUp); // use GetUp2SuperProt inside
If you need many output formats for many classes, I would try to make it a n + m problem instead of an n * m problem. The first way I come to think of is to have the classes reductible to some kind of dictionary, and then have a method to serlalize those dictionarys to each output formats.
Assuming you have full access to the classes that must be serialized. You need to add some form of reflection to the classes (probably including an abstract factory). There are two ways to do this: 1) a common base class or 2) a "traits" struct. Then you can write your encoders/decoders in relation to the base class/traits struct.
Alternatively, you could require that the class provide a function to export itself to a container of boost::any and provide a constructor that takes a boost::any container as its only parameter. It should be simple to write a serialization function to many different formats if your source data is stored in a map of boost::any objects.
That's two ways I might approach this. It would depend highly on the similarity of the classes to be serialized and on the diversity of target formats which of the above methods I would choose.
I used OpenH323 (famous enough for VoIP developers) library for long enough term to build number of application related to VoIP starting from low density answering machine and up to 32xE1 border controller. Of course it had major rework so I knew almost anything about this library that days.
Inside this library was tool (ASNparser) which converted ASN.1 definitions into container classes. Also there was framework which allowed serialization / de-serialization of these containers using higher layer abstractions. Note they are auto-generated. They supported several encoding protocols (BER,PER,XER) for ASN.1 with very complex ASN sntax and good-enough performance.
What was nice?
Auto-generated container classes which were suitable enough for clear logic implementation.
I managed to rework whole container layer under ASN objects hierarchy without almost any modification for upper layers.
It was relatively easy to do refactoring (performance) for debug features of that ASN classes (I understand, authors didn't intended to expect 20xE1 calls signalling to be logged online).
What was not suitable?
Non-STL library with lazy copy under this. Refactored by speed but I'd like to have STL compatibility there (at least that time).
You can find Wiki page of all the project here. You should focus only on PTlib component, ASN parser sources, ASN classes hierarchy / encoding / decoding policies hierarchy.
By the way,look around "Bridge" design pattern, it might be useful.
Feel free to comment questions if something seen to be strange / not enough / not that you requested actuall.