How can I decrease complexity in library without increasing complexity elsewhere? - c++

I am tasked to maintain and update a library which allows a computer to send commands at a hardware device and then receive its response. Currently the code is setup in such a way that every single possible command the device can receive is sent via its own function. Code repetition is everywhere; a DRY advocate's worst nightmare.
Obviously there is much opportunity for improvement. The problem is each command has a different payload. Currently the data that is to be the payload is passed to each command function in the form of arguments. It's difficult to consolidate functionality without pushing the complexity to a level that calls the library.
When a response is received from the device its data is put into an object of a class solely responsible for holding this data, they do nothing else. There are hundreds of classes which do this. These objects are then used to access the returned data by the app layer.
My objectives:
Throughly reduce code repetition
Maintain similiar level of complexity at application layer
Make it easier to add new commands
My idea:
Have one function to send a command and one to receive (the receiving function is automatically called when a response from the device is detected). Have a struct holding all command/response data which will be passed to sending function and returned by receiving function. Since each command has a corresponding enum value, have a switch statement which sets up any command specific data for sending.
Is my idea the best way to do it? Is there a design pattern I could use here? I've looked and looked but nothing seems to fit my needs.
Thanks in advance! (Please let me know if clarification is necessary)

This reminds me of the REST vs. SOA debate, albeit on a smaller physical scale.
If I understand you correctly, right now you have calls like
device->DoThing();
device->DoOtherThing();
and then sometimes I get a callback like
callback->DoneThing(ThingResult&);
callback->DoneOtherTHing(OtherThingResult&)
I suggest that the user is the key component here. Do the current library users like the interface at the level it is designed? Is the interface consistent, even if it is large?
You seem to want to propose
device->Do(ThingAndOtherThingParameters&)
callback->Done(ThingAndOtherThingResult&)
so to have a single entry point with more complex data.
The downside from a library user perspective may that now I have to use a manual switch() or other type statement to tell what really happened. While the dispatching to the appropriate result callback used to be done for me, now you have made it a burden upon the library user.
Unless this bought me as a user some level of flexibility, that I as as user wanted I would consider this a step backwards.
For your part as an implementor, one suggestion would be to go to the generic form internally, and then offer both interfaces externally. Perhaps the old specific interface could even be auto-generated somehow.
Good Luck.

Well, your question implies that there is a balance between the library's complexity and the client's. When those are the only two choices, one almost always goes with making the client's life easier. However, those are rarely really the only two choices.
Now in the text you talk about a command processing architecture where each command has a different set of data associated with it. In the olden days, this would typically be implemented with a big honking case statement in a loop, where each case called a different routine with different parameters and perhaps some setup code. Grisly. McCabe complexity analysers hate this.
These days what you can do with an OO language is use dynamic dispatch. Create a base abstract "command" class with a standard "handle()" method, and have each different command inherit from it to add their own members (to represent the different "arguments" to the different commands). Then you create a big honking array of these at startup, usually indexed by the command ID. For languages like C++ or Ada it has to be an array of pointers to "command" objects, for the dynamic dispatch to work. Then you can just call the appropriate command object for the command ID you read from the client. The big honking case statement is now handled implicitly by the dynamic dispatch.
Where you can get the big savings in this scenario is in subclassing. Do you have several commands that use the exact same parameters? Make a subclass for them, and then derive all of those commands from that subclass. Do you have several commands that have to perform the same operation on one of the parameters? Make a subclass for them with that one method implemented for that operation, and then derive all those commands from that subclass.

Your first objective should be to produce a library that decouples higher software layers from the hardware. Users of your library shouldn't care that you have a hardware device that can execute a number of functions with a different payload. They should only care what the device does in a higher level. In this sense, it is in my opinion a good thing that every command is mapped to each one function.
My plan will be:
Identify the objects the higher data layers need to get the job done. Model the objects in C++ classes from their perspective, not from the perspective of the hardware
Define the interface of the library using the above objects
Start the implementation of the library. Perhaps an intermediate layer that maps software objects to hardware objects is necessary
There are many things you can do to reduce code repetition. You can use polymorphism. Define a class with the base functionality and extend it. You can also use utility classes, that implement functions needed for many commands.

Related

C++ compile-time / runtime options and parameters, how to handle?

What is the proper way to handle compile-time and runtime options in a generic library? What is good practice for large software in which there are simply too many options for the user to bother about most of them?
Suppose the task is to write a large library to perform calculations on a number of datasets at the same time. There are numerous ways to perform these calculations, and the library must be be highly configurable. Typically, there are options relative to how the calculation is performed as a whole. Then, each dataset has its own set of calculation options. Finally, each calculation has a number of tuning parameters, which must be set as well.
The library itself is generic, but each application which uses that library will use a particular kind of dataset, for which tuning parameters will take on a certain value. Since they will not change throughout the life of the application, I make them known at application compile-time. The way I would implement these tuning parameters in the library is through a Traits class, which contains the tuning parameters as static const elements. Calibration of their final value is part of the development of the application.
The datasets will of course change depending on what the user feeds to the application, and therefore a number of runtime options must be provided as well (with intelligent defaults). Calibration of their default value is also part of the development of the application. I would implement these options as a Config class which contains these options, and can be changed on application startup (e.g. parsing a config text file). It gets passed to the constructor of a lot of the classes in the library. Each class then calls the Config::get_x for their specific option x.
The thing I don't really like about this design, is that both Traits and Config classes break encapsulation. Some options relate to some parts of the library. Most of the time, however, they don't. And having them suddenly next to each other annoys me, because they affect separate things in the code, which are often in different abstraction layers.
One solution I was thinking about, is using multiple public inheritance for these different parts. A class which needs to know an option then casts the Config object or calls the relevant Trait parent to access it. Also, this passing along of Config to every class that needs it (or whose members need it) is very inelegant. Maybe Config should be a singleton?
You could have your parameters in a single struct named Config (to keep your words) and make it a singleton.
Encapsulation is important to preserve classes consistency, because a class is responsible of itself. But in your case where the Config class must be accessible to everyone, it is necessary. Furthermore, adding getters and setters to this type of class will only add overhead (in the best case you compiler will probably just inlined it).
Also, if you really want a Traits class to implement compile time parameters, you should probably just have an initialization function (like the constructor of your library).

How to make proper design/architecture of partially reusable algorithm? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am very sorry for the long explanation, but it is required for proper understanding.
I am working on computer vision algorithms for industrial tasks. Computer vision algorithms tend to be very complicate. Usually they involve calls for dozens (at the very least) of simpler algorithms (that are not simple either). Those calls form certain hierarchy: bigger tasks call some smaller ones, which in turn call even smaller ones, and so on.
Let’s take for example typical computer vision task: find object in image under certain conditions. This is a task that should be performed in dozens of different applications. Each application has its own set of conditions and thus it is impossible to create single algorithm that works for all of them. But they are pretty similar. Usually it is enough to replace one or two lower level functions. For example: use different method for detection of points of interest in image.
And here comes a problem: for each new application I had to copy whole code from one of the existing applications and adapt relevant parts, which is a bad practice. I am trying to eliminate those duplications by creating system of algorithms that can be used in all application without changing the code itself. Here is the list of issues system had to deal with (at least the ones I identified so far):
1) Arguments provided to main algorithm should be able to set the 'algorithmic flow' inside the system, i.e. they determine what lower level algorithms are used and how
2) Different sub-algorithms that perform same task may require different inputs. One may need an array of ints, another requires pair of double, and so on... Algorithms on the higher level should be oblivious to replacement of one sub-algorithm with another. That means they should not be aware of what arguments they receive and pass down to sub-algorithms. Same true for output of sub-algorithm. It may vary if different combination of sub-algorithms is used
3) The system must be extendable. If new sub-algorithm became available (for example: yet another way to find points of interest) the system should be able to call it. I understand that changes might be unavoidable at this point, but I would like to keep them at minimum. And in any case the system should be able to work at the same way with previous sets of arguments.
4) System must be debuggable. End user of the system should have reasonable way to dump debug information about the 'algorithmic flow' in his system, so that algorithm developer will be able to recreate the situation. It is not that trivial considering requirement (3).
5) There should be reasonable way to make sanity check for the flow of algorithms.
6) I am not going to throw exceptions but there should be reasonable way to return success / fail status of each algorithm. Again it is not easy because of requirement (3).
7) This one is more 'good to have' rather than 'must have', but it may be important. Some calculations may be performed by multiple sub-algorithms. For example calculation of gradients in image may (or may not) be required for multiple different tasks. It is good to have an option to store results of those calculations in order to reuse them later.
I created some kind of solution to this but it is far from being good. Do you have any recommendations about how this should be done?
Used language: C++
Thanks you
I'd just use some tried and true design patterns.
Use a strategy pattern to represent an algorithm that you may wish to swap out for alternatives.
Use a factory to instantiate different algorithm (strategy) instances based on some input parameter or runtime context - I'm a fan of the prototype factory where you have "inert" instances of each object in some lookup table, and based on a key you pass in you can request a clone of the one needed. I like it mainly because it's easiest to extend - you can even add new configured prototype instances to such a factory at runtime.
Note that the same "strategy" model does not have to serve for everything - it sounds like you might have some higher-level/fuzzy operations which then assemble or chain together low-level/detailed operations. The high level operations could be one type of abstract object while the detailed algorithms are the more concrete strategy instances.
As far as the inputs to the various algorithms, if it varies a lot from algorithm to algorithm you could use an extensible object like a dictionary for parameters so that each algorithm can use just the parameters it needs and ignore the others for an operation. If the dictionary is modifiable during the operation this would also permit upstream algorithms to add parameters for downstream algorithms. Key-value pairs are pretty easy to dump to a log or view in a debugger.
If each strategy instance has a unique semantic identifier you could easily debug the algorithms that get instantiated and chained together. (I use an audio DSP library that has a function to dump a description of the whole chain of configured audio processors, it's very handy).
If you use a system with strategy patterns and extensible parameters you should also be able to segregate shared algorithms from application-specific algorithms, but still have the same basic framework for instantiating and running them.
hth
I'm going to assume that you are a competent OO programmer with good domain knowledge, and your problem is more about a higher level of organisation of software components (implementing algorithms) than OO generally provides.
The patterns mentioned by #orpheist make perfect sense. Consider them. They will not solve all the problems you list. You should also consider the following.
In what form will the data be for algorithms to access?
Will you need adapters to connect one component to another?
Do you pass the data to the component or the component to the data?
Do you want to assemble a pipeline or group of components to build higher ones, which can then be applied to the data?
Do you need a language (XML, DSL) to express connections and to allow for easy experimentation?
Is performance a dominant issue already, or can you afford more interpretive techniques at this stage?
It think you need to refine some of your questions and provide some more concrete specifics. I also think your questions would be a better fit on programmers.stackexchange than here.

Is it possible to use policy based design together with automated testing?

I am developing a numerical simulations library which is centred around a single collection of data operated on by different computational algorithms. The algorithms are complex, they have different states involving multiple parameters, and are interchangeable (under some semantic restrictions).
To avoid bloated interface of the collection and to enable different implementations etc, I'm thinking about using policy based design. This gives the collection a wide combination of choices between storage structures, algorithms, parameters, internal stuff.
If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures? Conceptually I need to define the set of policies and a set of verification test cases and execute a parametric study.
This is easy when object oriented programming is used since I can determine all necessary types and their parameters during run-time using e.g. a string-based Abstract Factory with type names stored in the input file, that is then changed by an external script that executes the client application on a family of test cases.
How do I do that with policies, where a combination of N policies ends up in being N different client applications?
How is automated testing done together with policy based design in a professional way?
If you're representing algorithms as policies, you /should/ have a pretty uniform interface already thought up. You could imagine an "AlgorithmPolicy" processing some data from your data store and returning some representation of the results.
"If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures?"
If your object oriented design currently makes use of the strategy pattern (see also: the Gang of Four book), your policies will simply replace every place that you've used a strategy. Choosing "optimal algorithms" for the different policies you design will simply be a matter of nailing the right conceptual structure / interface for those policies. (If you're going to use many different data stores, make sure that the interface for adding / removing / getting data from them is uniform, for example. Here, it can be helpful to think of three examples and find commonalities... then think of another exmaple and make sure it fits the schema. Iterate until things feel correct.)
You'll still have adequate type checking, it'll just feel a bit different (and you may run into some nasty compile errors occaisionally. ;)
Testing will simply be a matter of writing some unit tests for each of the configurations / policy combinations you'd like to cover. You probably should already be writing these tests anyways; the primary difference is that you'll want to try to hit the interfaces you designate rather than targetting specifics.
You can validate different storage methods based on validations of your algorithm policies. (So, if I have some algorithm that can be stored in different ways, I can run the algorithm on some test data for ecah storage mechanism and expect the same results.) Assuming that you've spec'd out the inteface correclty, you should only need to write a single test for each additional storage mechanism you add.
Again: It'd be nice to have more details about the structure of the program, what different parameters and such you'd need to pass in. (Is any of this code open source / going to be open sourced?)
From what you've said, in my mind, your complicated-policy process may have an interface like so:
FancyDataStore.Process()
For testing it, I'd write:
MockAlgorithmPolicy - A very simple algorithm that's trivial to validate.
MockInternalStuffPolicy - A very simple internal stuff policy that causes no integrations / reports nothing new.
MockStoragePolicy - A very simple storage policy that meets your interface for storage / doesn't cause many issues.
Write a test that validates the mocks put together...
For each StoragePolicy you create, write an automated test to validate it:
testSomeStoragePolicy{
// has a call to:
FancyDataStore.Process<MockAlgorithmPolicy, SomeStoragePolicy, MockInternalStuff>()
// validate...
}
That should prove that the SomeStoragePolicy works as expected.
Then, for your algorithms, you could write:
testSomeAlgorithmPolicy{
FancyDataStore.process<SomeAlgorithmPolicy, MockStoragePolicy, MockInternalStuff>();
///Validate.
}
etc.
This way, you write basically 1 test per each policy you end up writing (which seems feasible and not too ridiculous) Additionally, you can always add additional unit tests to cover other subtle integrations that may spin up over time.
If you're looking for good books on this subject, I'd suggest reading "Modern C++ Programming"; it provides a great primer on policy-driven design in C++.

Efficient ways to save and load data from C++ simulation

I would like to know which are the best way to save and load C++ data.
I am mostly interested in saving classes and matrices (not sparse) I use in my simulations.
Now I just save them as txt files, but if I add a member to a class I then have to modify the function that loads the data (it has to parse and check for the value in the txt file),
that I think is not ideal.
What would you recommend in general? (p.s. as I'd like to release my code I'd really like to use only standard c++ or libraries that can be redistributed).
In this case, there is no "best." What is best for you is highly dependent upon your situation. But, lets have an example to get you thinking about your details and how deep this rabbit hole can go.
If you absolutely positively must have the fastest save possible without question (and you're willing to pay the price), you can define your own memory management to put all objects into a contiguous array of a common type (such as integers). This allows you to write that array to disk as binary data very rapidly. You might need this in a simulation that uses threads efficiently to load every core/processor to run at real time.
Why is a rather horrible solution? Because it takes a LOT of work and runs many risks for problems in the name of "optimization."
It requires you to build your own memory management (operator new() and operator delete()) which may need to be thread safe.
If you try to load from this array, you will have to placement new all objects with a unique non-modifying constructor in order to ensure all virtual pointers are set properly. Oh, and you have to track the type of each address to now how to do this.
For portability with other systems and between versions of the binary, you will need to have utilities to convert from the binary format to something generic enough to be cross platform (including repopulating pointers to other objects).
I have done this. It was highly unpleasant. I have no doubt there are still problems with it and I have only listed a few here. But, it was very, very fast and very, very, very problematic.
You must design to your needs. Generally, the first need is "Make it work." Don't care about efficiency, just about something that accurately persists and that you have the information known and accessible at some point to do it. Also, you should encapsulate the process of saving and loading. Then, if the need "Make it better" steps in, you should be able to change that one bit of code and the rest should work. You might even make the saving format selectable on user needs instead of your needs which you must assume for all users.
Given all the assumptions, pros and cons listed, you should be able to elaborate your particular needs for this question.
Given that performance is not your concern -- which is a critical part of the answer -- the Boost Serialization library is a great answer.
The link in the comment leads to the documentation. Read the tutorial (which is overkill for what you are initially wanting, but well worth it).
Finally, since you have mostly array matrices, try to encapsulate the entire process of save and load so that should you need to change it later, you are writing a new implementatio and choosing between the exisiting. I expend the eddedmtime for the smarts of Boost Serialization would not be great; however, you might find a future requirement moves you to something else or multiple something elses.
The C++ Middleware Writer automates the creation of marshalling functions. When you add a member to a class, it updates the marshalling functions for you.

What's a pattern for getting two "deep" parts of a multi-threaded program talking to each other?

I have this general problem in design, refactoring or "triage":
I have an existing multi-threaded C++ application which searches for data using a number of plugin libraries. With the current search interface, a given plugin receives a search string and a pointer to a QList object. Running on a different thread, the plugin goes out and searches various data sources (locally and on the web) and adds the objects of interest to the list. When the plugin returns, the main program, still on the separate thread, adds this data to the local data store (with further processing), guarding this insertion point using a mutex. Thus each plugin can return data asynchronously.
The QT-base plugin library is based on message passing. There are a fair number of plugins which are already written and tested for the application and they work fairly well.
I would like to write some more plugins and leverage the existing application.
The problem is that the new plugins will need more information from the application. They will to need intermittent access to the local data store itself as they search. So to get this, they would need direct or indirect access both the hash array storing the data and the mutex which guards multiple access to the store. I assume the access would be encapsulated by adding an extra method in a "catalog" object.
I can see three ways to write these new plugins.
When loading a plugin, pass them
a pointer to my "catalog" at the
start. This becomes an extra,
"invisible" interface for the new
plugins. This seems quick, easy,
completely wrong according to OO but
I can't see what the future problems would be.
Add a method/message to the
existing interface so I have a
second function which could be
called for the new plugin libraries,
the message would pass a pointer to
the catalog to the plugins. This
would be easy for the plugins but it
would complicate my main code and
seems generally bad.
Redesign the plugin interface.
This seems "best" according to OO,
could have other added benefits but
would require all sorts of
rewriting.
So, my questions are
A. Can anyone tell me the concrete dangers of option 1?
B. Is there a known pattern that fits this kind of problem?
Edit1:
A typical function for calling the plugin routines looks like:
elsewhere(spec){
QList<CatItem> results;
plugins->getResult(spec, &results);
use_list(results);
}
...
void PluginHandler::getResults(QString* spec, QList<CatItem>* results)
{
if (id->count() == 0) return;
foreach(PluginInfo info, plugins) {
if (info.loaded)
info.obj->msg(MSG_GET_RESULTS, (void*) spec, (void*) results);
}
}
It's a repeated through-out the code. I'd rather extend it than break it.
Why is it "completely wrong according to OO"? If your plugin needs access to that object, and it doesn't violate any abstraction you want to preserve, it is the correct solution.
To me it seems like you blew your abstractions the moment you decided that your plugin needs access to the list itself. You just blew up your entire application's architecture. Are you sure you need access to the actual list itself? Why? What do you need from it? Can that information be provided in a more sensible way? One which doesn't 1) increase contention over a shared resource (and increase the risk of subtle multithreading bugs like race conditions and deadlocks), and 2) doesn't undermine the architecture of the rest of the app (which specifically preserves a separation between the list and its clients, to allow asynchronicity)
If you think it's bad OO, then it is because of what you're fundamentally trying to do (violate the basic architecture of your application), not how you're doing it.
Well, option 1 is option 3, in the end. You are redesigning your plugin API to receive extra data from the main app.
It's a simple redesign that, as long as the 'catalog' is well implemented and hide every implementation detail of your hash and mutex backing store, is not bad, and can serve the purpose well enough IMO.
Now if the catalog leaks implementation details then you would better use messages to query the store, receiving responses with the needed data.
Sorry, I just re-read your question 3 times and I think my answer may have been too simple.
Is your "Catalog" an independent object? If not, you could wrap it as it's own object. The Catalog should be completely safe (including threadsafe)--or better yet immutable.
With this done, it would be perfectly valid OO to pass your catalog to the new plugins. If you are worried about passing them through many layers, you can create a factory for the catalog.
Sorry if I'm still misunderstanding something, but I don't see anything wrong with this approach. If your catalog is an object outside your control, however, such as a database object or collection then you really HAVE to encapsulate it in something you can control with a nice, clean interface.
If your Catalog is used by many pieces across your program, you might look at a factory (which, at it's simplest degrades to a Singleton). Using a factory you should be able to summon your Catalog with a Catalog.getType("Clothes"); or whatever. That way you are giving out the same object to everyone who wants one without passing it around.
(this is very similar to a singleton, by the way, but coding it as a factory reminds you that there will almost certainly be more than one--also remember to allow a Catalog.setType("Clothes", ...); for testing.