Flexible application configuration in C++ - c++

I am developing a C++ application used to simulate a real world scenario. Based on this simulation our team is going to develop, test and evaluate different algorithms working within such a real world scenrio.
We need the possibility to define several scenarios (they might differ in a few parameters, but a future scenario might also require creating objects of new classes) and the possibility to maintain a set of algorithms (which is, again, a set of parameters but also the definition which classes are to be created). Parameters are passed to the classes in the constructor.
I am wondering which is the best way to manage all the scenario and algorithm configurations. It should be easily possible to have one developer work on one scenario with "his" algorithm and another developer working on another scenario with "his" different algorithm. Still, the parameter sets might be huge and should be "sharable" (if I defined a set of parameters for a certain algorithm in Scenario A, it should be possible to use the algorithm in Scenario B without copy&paste).
It seems like there are two main ways to accomplish my task:
Define a configuration file format that can handle my requirements. This format might be XML based or custom. As there is no C#-like reflection in C++, it seems like I have to update the config-file parser each time a new algorithm class is added to project (in order to convert a string like "MyClass" into a new instance of MyClass). I could create a name for every setup and pass this name as command line argument.
The pros are: no compilation required to change a parameter and re-run, I can easily store the whole config file with the simulation results
contra: seems like a lot of effort, especially hard because I am using a lot of template classes that have to be instantiated with given template arguments. No IDE support for writing the file (at least without creating a whole XSD which I would have to update everytime a parameter/class is added)
Wire everything up in C++ code. I am not completely sure how I would do this to separate all the different creation logic but still be able to reuse parameters across scenarios. I think I'd also try to give every setup a (string) name and use this name to select the setup via command line arg.
pro: type safety, IDE support, no parser needed
con: how can I easily store the setup with the results (maybe some serialization?)?, needs compilation after every parameter change
Now here are my questions:
- What is your opinion? Did I miss
important pros/cons?
- did I miss a third option?
- Is there a simple way to implement the config file approach that gives
me enough flexibility?
- How would you organize all the factory code in the seconde approach? Are there any good C++ examples for something like this out there?
Thanks a lot!

There is a way to do this without templates or reflection.
First, you make sure that all the classes you want to create from the configuration file have a common base class. Let's call this MyBaseClass and assume that MyClass1, MyClass2 and MyClass3 all inherit from it.
Second, you implement a factory function for each of MyClass1, MyClass2 and MyClass3. The signatures of all these factory functions must be identical. An example factory function is as follows.
MyBaseClass * create_MyClass1(Configuration & cfg)
{
// Retrieve config variables and pass as parameters
// to the constructor
int age = cfg->lookupInt("age");
std::string address = cfg->lookupString("address");
return new MyClass1(age, address);
}
Third, you register all the factory functions in a map.
typedef MyBaseClass* (*FactoryFunc)(Configuration *);
std::map<std::string, FactoryFunc> nameToFactoryFunc;
nameToFactoryFunc["MyClass1"] = &create_MyClass1;
nameToFactoryFunc["MyClass2"] = &create_MyClass2;
nameToFactoryFunc["MyClass3"] = &create_MyClass3;
Finally, you parse the configuration file and iterate over it to find all the entries that specify the name of a class. When you find such an entry, you look up its factory function in the nameToFactoryFunc table and invoke the function to create the corresponding object.

If you don't use XML, it's possible that boost::spirit could short-circuit at least some of the problems you are facing. Here's a simple example of how config data could be parsed directly into a class instance.

I found this website with a nice template supporting factory which I think will be used in my code.

Related

Where should the user-defined parameters of a framework be ?

I am kind of a newbie and I am creating a framework to evolve objects in C++ with an evolutionary algorithm.
An evolutionary algorithm evolves objects and tests them to get the best solution (for example, evolve the weights neural network and test it on sample data, so that in the end you get a network which has a good accuracy, without having trained it).
My problem is that there are lots of parameters for the algorithm (type of selection/crossover/mutation, probabilities for each of them...) and since it is a framework, the user should be able to easily access and modify them.
CURRENT SOLUTION
For now, I created a header file parameters.h of this form:
// DON'T CHANGE THESE PARAMETERS
//mutation type
#define FLIP 1
#define ADD_CONNECTION 2
#define RM_CONNECTION 3
// USER DEFINED
static const int TYPE_OF_MUTATION = FLIP;
The user modifies the static variables TYPE_OF_MUTATION and then my mutation function tests what the value of TYPE_OF_MUTATION is and calls the right mutation function.
This works well, but it has a few drawbacks:
when I change a parameter in this header and then call "make", no change is taken into account, I have to call "make clean" then "make". From what I saw, it is not a problem in the makefile but it is how building works. Even if it did re-build when I change a parameter, it would mean re-compile the whole project as these parameters are used everywhere; it is definitely not efficient.
if you want to run the genetic algorithm several times with different parameters, you have to run it a first time then save the results, change the parameters then run it a second time etc.
OTHER POSSIBILITIES
I thought about taking these parameters as arguments of the top-level function. The problem is that the function would then take 20 arguments or so, it doesn't seem really readable...
What I mean about the top-level function is that for now, the evolutionary algorithm is run simply by doing this:
PopulationManager myPop;
myPop.evolveIt();
If I defined the parameters as arguments, we would have something like:
PopulationManager myPop;
myPop.evolveIt(20,10,5,FLIP,9,8,2,3,TOURNAMENT,0,23,4);
You can see how hellish it may be to always define parameters in the right order !
CONCLUSION
The frameworks I know make you build your algorithm yourself from pre-defined functions, but the user shouldn't have to go through all the code to change parameters one by one.
It may be useful to indicate that this framework will be used internally, for a definite set of projects.
Any input about the best way to define these parameters is welcome !
If the options do not change I usually use a struct for this:
enum class MutationType {
Flip,
AddConnection,
RemoveConnection
};
struct Options {
// Documentation for mutation_type.
MutationType mutation_type = MutationType::Flip;
// Documentation for integer option.
int integer_option = 10;
};
And then provide a constructor that takes these options.
Options options;
options.mutation_type = MutationType::AddConnection;
PopulationManager population(options);
C++11 makes this really easy, because it allows specifying defaults for the options, so a user only needs to set the options that need to be different from the default.
Also note that I used an enum for the options, this ensures that the user can only use correct values.
This is a classic example of polymorphism. In your proposed implementation you're doing a switch on constant to decide which polymorphic mutation algorithm you will choose to decide how to mutate the parameter. In C++, the corresponding mechanisms are templates (static polymorphism) or virtual functions (dynamic polymorphism) to select the appropriate mutating algorithm to apply to the parameter.
The templates way has the advantage that everything is resolvable at compile time and the resulting mutating algorithm could be inlined entirely, depending on the implementation. What you give up is the ability to dynamically select parameter mutation algorithms at runtime.
The virtual function way has the advantage that you can defer the choice of mutation algorithm until runtime, allowing this to vary based on input from the user or whatnot. The disadvantage is that the mutation algorithm can no longer be inlined and you pay the cost of a virtual function call (an extra level of indirection) when you mutate the parameter.
If you want to see a real example of how "algorithmic mutation" can work, look at evolve.cpp in my Iterated Dynamics repository on github. This is C code converted to C++ so it is neither using templates nor using virtual functions. Instead it uses function pointers and a switch-on-constant to select the appropriate code. However, the idea is the same.
My recommendation would be to see if you can use static polymorphism (templates) first. From your initial description you were fixing the mutation at compile-time anyway, so you're not giving anything up.
If that was just a prototyping phase and you intended to support switching of mutation algorithms at runtime, then look at virtual functions. As the other answer recommended, please shun C-style coding like #define constants and instead use proper enums.
To solve the "long parameter list smell", the idea of packing all the parameters into a structure is a good one. You can achieve more readability on top of that by using the builder pattern to build up the structure of parameters in a more readable way than just assigning a bunch of values into a struct. In this blog post, I applied the builder pattern to the resource description structures in Direct3D. That allowed me to more directly express these "bags of data" with reasonable defaults and directly reveal my intent to override or replace default values with special values when necessary.

Should I prefer a const function?

Assume I want to implement class A which must load its "configuration" from a file. And let's assume the "configuration" is a simple map<string, string>.
I can implement the A::LoadConfiguration in two different ways:
void A::LoadConfiguration(string filename)
map<string, string> A::LoadConfiguration(string filename) const
Should I prefer either of the two implementations, and why?
If you prefer the second version when the user wants to get info on a file they will base all their algorithms on the map. If you do the second version, meaning the implementation may be a map, but doesn't have to be, they can base their code around an API which does not have to change even if the internal implementation does.
Consider the situation where later you realize it is far more efficient to use an std array, for whatever reason, now every program using this code has to change many of it's algorithms. Using the first version the change to array can be handled internally and reflect no changes on the outside.
Now if you are planning to make multiple instances of the class you will definitely want to make it a static method because you don't want the file to load every time you call the constructor (especially if the file will not change).
Completely ignoring your suggestions, but this is probably how I would do it (not knowing all your constraints, so ignore me if it does not fit):
class A
{
public:
static A fromConfiguration( string fileName );
/* ... */
}
In most cases, the "configuration" of a class should be set at object creation, so forcing the user to provide it on construction is a good thing (instead of having to remember to do do the loading later).
namespace NeatStuff
{
map<string,string> loadSimpleConfiguration( string fileName );
}
If the configuration file format is really simple (and not specific to your class) you can move the actual loading out of the class.
Assuming other classes use the configuration later, I prefer option 1, and an additional GetConfigurationParameter public const method that gets the config value for a particular key. That lets me make other classes which can just ask for some parameter by name without ever caring that it's implemented as a map.
Another reason why I prefer option 1 is that loading a configuration should be distinct from returning it. If I see a name like LoadConfiguration, I assume that it loads the config from somewhere and sets the parameters in the class. I do not assume it returns some description of the configuration, which I'd instead expect from a method like GetConfiguration - but opinions on this will vary for different people of course.

Parsing different xml messages. Versions

Say we want to Parse a XML messages to Business Objects. We split the process in two parts, namely:
-Parsing the XML messages to XML Grammar Objects.
-Transform XML Objects to Business Objects.
The first part is done automatically, generation a grammar object for each node.
The second part is done following the XML architecture so far. Example:
If we have the XML Message(Simplified):
<Main>
<ChildA>XYZ</ChildA>
<ChildB att1="0">
<InnerChild>YUK</InnerChild>
</ChildB>
</Main>
We could find the following classes:
DecodeMain(Calls DecodeChildA and B)
DecodeChildA
DecodeChildB(Calls DecodeInnerChild)
DecodeInnerChild
The main problem arrives when we need to handle versions of the same messages. Say we have a new version where only DecodeInnerChild changes(e.g.: We need to add an "a" at the end of the value)
It is really important that the solutions agile for further versions and as clean as possible. I considered the following options:
1)Simple Inheritance:Create two classes of DecodeInnerChild. One for each version.
Shortcomming: I will need to create different classes for every parent class to call the right one.
2)Version Parameter: Add to each method an Object with the version as a parameter. This way we will know what to do within each method according to each version.
Shortcoming: Not clean at all. The code of different versions is mixed.
3)Inheritance + Version Parameter: Create 2 classes with a base class for the common code for the nodes that directly changes (Like InnerChild) and add the version as a parameter in each method. When a node call the another class to decode the child object, it will use one or another class depending on the Version parameter.
4)Some kind of executor pattern(I do not know how to do it): Define at the start some kind of specifications object, where all the methods that are going to be used are indicated and I pass this object to a class that is in charge of execute them.
How would you do it? Other ideas are welcomed.
Thanks in advance. :)
How would you do it? Other ideas are welcomed.
Rather than parse XML myself I would as first step let something like CodesynthesisXSD to generate all needed classes for me and work on those. Later when performance or something becomes issue I would possibly start to look aound for more efficient parsers and if that is not fruitful only then i would start to design and write my own parser for specific case.
Edit:
Sorry, I should have been more specific :P, the first part is done
automatically, the whole code is generated from the XML schema.
OK, lets discuss then how to handle the usual situation that with evolution of software you will eventually have evolved input too. I put all silver bullets and magic wands on table here. If and what you implement of them is totally up to you.
Version attribute I have anyway with most things that I create. It is sane to have before backward-compatibility issue that can not be solved elegantly. Most importantly it achieves that when old software fails to parse newer input then it will produce complaint that makes immediately sense to everybody.
I usually also add some interface for converter. So old software can be equipped with converter from newer version of input when it fails to parse that. Also new software can use same converter to parse older input. Plus it is place where to plug converter from totally "alien" input. Win-win-win situation. ;)
On special case of minor change I would consider if it is cheap to make new DecodeInnerChild to be internally more flexible so accepts the value with or without that "a" in end as valid. In converter I have still to get rid of that "a" when converting for older versions.
Often what actually happens is that InnerChild does split and both versions will be used side-by-side. If there is sufficient behavioral difference between two InnerChilds then there is no point to avoid polymorphic InnerChilds. When polymorphism is added then indeed like you say in your 1) all containing classes that now have such polymorphic members have to be altered. Converter should usually on such cases either produce crippled InnerChild or forward to older version that the input is outside of their capabilities.

What is the best design pattern to register data "chunks"?

I have a library which can save/load on disk "chunks" which are POD structs with constant size and unique static CHUNK_ID field. So load looks somethink like this.
void Load(int docId, char* ptr, int type, size_t& size)...
If you want to add new chunk you just add struct with new CHUNK_ID and use Save Load functions to it.
What I want is to force all "chunks" to have functions like PrintHumanReadable, CompareThisTypeOfChunk etc(Ideally program should not compile without such functions). Also I want to mark/register/enumerate all chunk-structs.
I have a few ideas but all of them have problems.
Create base class with pure virtual functions PrintHumanReadable, CompareThisTypeOfChunk.
Problem:breaks pod type and requires library rewriting.
Implement factory which creates chunk struct from CHUNK_ID. Problem: compiles when I add new chunk without required functions.
Could you recomend elegant design solution for my problem?
Implement a simple code generator. You can use something like Mako or Cheetah (both Python libraries). Make a text file containing all the class names, then have the generator build the factory method and a series of methods which aren't really used but which refer to the desired methods in all the classes. This will also make it straightforward to enumerate the classes (again, using generated code).
The proper design pattern for this is called "use Boost.Serialization". It's really the best tool for writing objects to a format and then reading them back later. It can write in text, binary, and even XML formats (and others if you write a proper stream for them). It's can be non-intrusive, so you don't need to modify the objects to serialize them. And so forth.
Once you're using the proper tool for this job, you can then use whatever class hierarchy or other method you like to ensure that the proper functions for an object exist.
If you can't/won't use Boost.Serialization, then you're pretty much stuck with a runtime solution. And since the solution is runtime rather than compile time, there's no way to ensure at compile time that any particular chunk ID has the requisite functions.

Putting all code of a module behind 1 interface. Good idea or not?

I have several modules (mainly C) that need to be redesigned (using C++). Currently, the main problems are:
many parts of the application rely on the functions of the module
some parts of the application might want to overrule the behavior of the module
I was thinking about the following approach:
redesign the module so that it has a clear modern class structure (using interfaces, inheritence, STL containers, ...)
writing a global module interface class that can be used to access any functionality of the module
writing an implementation of this interface that simply maps the interface methods to the correct methods of the correct class in the interface
Other modules in the application that currently directly use the C functions of the module, should be passed [an implementation of] this interface. That way, if the application wants to alter the behavior of one of the functions of the module, it simply inherits from this default implementation and overrules any function that it wants.
An example:
Suppose I completely redesign my module so that I have classes like: Book, Page, Cover, Author, ... All these classes have lots of different methods.
I make a global interface, called ILibraryAccessor, with lots of pure virtual methods
I make a default implementation, called DefaultLibraryAccessor, than simply forwards all methods to the correct method of the correct class, e.g.
DefaultLibraryAccessor::printBook(book) calls book->print()
DefaultLibraryAccessor::getPage(book,10) calls book->getPage(10)
DefaultLibraryAccessor::printPage(page) calls page->print()
Suppose my application has 3 kinds of windows
The first one allows all functionality and as an application I want to allow that
The second one also allows all functionality (internally), but from the application I want to prevent printing separate pages
The third one also allows all functionality (internally), but from the application I want to prevent printing certain kinds of books
When constructing the window, the application passes an implementation of ILibraryAccessor to the window
The first window will get the DefaultLibraryAccessor, allowing everything
I will pass a special MyLibraryAccessor to the second window, and in MyLibraryAccessor, I will overrule the printPage method and let it fail
I will pass a special AnotherLibraryAccessor to the third window, and in AnotherLibraryAccessor, I will overrule the printBook method and check the type of book before I will call book->print().
The advantage of this approach is that, as shown in the example, an application can overrule any method it wants to overrule. The disadvantage is that I get a rather big interface, and the class-structure is completely lost for all modules that wants to access this other module.
Good idea or not?
You could represent the class structure with nested interfaces. E.g. instead of DefaultLibraryAccessor::printBook(book), have DefaultLibraryAccessor::Book::print(book). Otherwise it looks like a good design to me.
Maybe look at the design pattern called "Facade". Use one facade per module. Your approach seems good.
ILibraryAccessor sounds like a known anti-pattern, the "god class".
Your individual windows are probably better off inheriting and overriding at Book/Page/Cover/Author level.
The only thing I'd worry about is a loss of granularity, partly addressed by suszterpatt previously. Your implementations might end up being rather heavyweight and inflexible. If you're sure that you can predict the future use of the module at this point then the design is probably ok.
It occurs to me that you might want to keep the interface fine-grained, but find some way of injecting this kind of display-specific behaviour rather than trying to incorporate it at top level.
If you have n number of methods in your interface class, And there are m number of behaviors per each method, you get m*(nC1 + nC2 + nC3 + ... + nCn) Implementations of your interface (I hope I got my math right :) ). Compare this with the m*n implementations you need if you were to have a single interface per function. And this method has added flexibility which is more important. So, no - I don't think a single interface would do. But you don't have to be extreme about it.
EDIT: I am sure the math is wrong. :(