Thread static class methods vs global scope - c++

Imagine a functionality of an application that requires up to 5 threads crunching data, these threads use buffers, mutex and events to interact with each other. The performance is critical, and the language is C++.
The functionality can be implemented as one (compilation) unit with one class, and only one instance of this class can be instantiated for the application. The class itself implements 1 of the threads in run() method, which spawns other 4 threads, manages them and gathers them when user closes the application.
What is the advantage of choosing one of the following method over another (please do let me know of any better approach)?
Add 5 static methods to the class, each running a single thread, mutex and other data shared as static class variables.
Add 5 global functions (no scope) and use global variables, events and mutex (as if it is C)
change the pattern entirely, add 4 more classes each implementing one of the threads and share data via global variables.
Here are some thoughts and issues to be considered (please correct them if they are wrong):
Having threads as class members (static of course), they can rely on the singleton to access non-static member functions, it also gives them a namespace which by itself seems a good idea.
Using static class methods, the class header file soon will contain many static variables (and other helper static methods). Having to declare variables in the class header file may bring additional dependencies to other units that include the header file. If variables where declared globally they could be hidden in a separate header file.
Static class variables should be defined somewhere in the code, so it doubles typing declaration stuff.
Compilers can take advantage of the namespace resolution for more optimized code (as opposed to global variables possibly in different units).
The single unit can potentially be better optimized, whereas whole program optimization is slow and probably less fruitful.
If the unit grows I have to move some part of the code to a separate unit, so I will have one class with multiple (compilation) units, is this a anti-pattern or not?
If using more than one class, each handling one thread, again same question can be made to decide between static methods and global functions to implement the threads. In addition, this requires more lien of code, not a real issue but does it worth the additional overhead?
Please answer this assuming no library such as Qt, and then assuming that we can rely on QThread and implement one thread per run() method.
Edit1: The number of threads is fixed per design, number 5 is just an example. Please share your thoughts on the approaches/patterns and not on details.
Edit2: I have found this answer (to a different question) very helpful, I guess the first approach misuses classes as namespaces. Second approach can be mitigated if coupled with namespace.

Sources
First, you should read the whole concurrency articles from Herb Sutter:
http://herbsutter.com/2010/09/24/effective-concurrency-know-when-to-use-an-active-object-instead-of-a-mutex/
This is the link to the last article's post, which contains the links to all the previous articles.
What's your case?
According to the following article: How Much Scalability Do You Have or Need? ( http://drdobbs.com/parallel/201202924 ), you are in the O(K): Fixed case. That is, you have a fixed set of tasks to be executed concurrently.
By the description of your app, you have 5 threads, each one doing a very different thing, so you must have your 5 threads, perhaps hoping one or some among those can still divide their tasks into multiple threads (and thus, using a thread pool), but this would be a bonus.
I let you read the article for more informations.
Design questions
About the singleton
Forget the singleton. This is a dumb, overused pattern.
If you really really want to limit the number of instances of your class (and seriously, haven't you something better to do than that?), You should separate the design in two: One class for the data, and one class to wrap the previous class into the singleton limitation.
About compilation units
Make your headers and sources easy to read. If you need to have the implementation of a class into multiple sources, then so be it. I name the source accordingly. For example, for a class MyClass, I would have:
MyClass.hpp : the header
MyClass.cpp : the main source (with constructors, etc.)
MyClass.Something.cpp : source handling with something
MyClass.SomethingElse.cpp : source handling with something else
etc.
About compiler optimisations
Recent compiler are able to inline code from different compilation units (I saw that option on Visual C++ 2008, IIRC). I don't know if whole global optimization works worse than "one unit" compilation, but even if it is, you can still divide your code into multiple sources, and then have one global source include everything. For example:
MyClassA.header.hpp
MyClassB.header.hpp
MyClassA.source.hpp
MyClassB.source.hpp
global.cpp
and then do your includes accordingly. But you should be sure this actually makes your performance better: Don't optimize unless you really need it and you profiled for it.
Your case, but better?
Your question and comments speak about monolithic design more than performance or threading issue, so I could be wrong, but what you need is simple refactoring.
I would use the 3rd method (one class per thread), because with classes comes private/public access, and thus, you can use that to protect the data owned by one thread only by making it private.
The following guidelines could help you:
1 - Each thread should be hidden in one non-static object
You can either use a private static method of that class, or an anonymously namespaced function for that (I would go for the function, but here, I want to access a private function of the class, so I will settle for the static method).
Usually, thread construction functions let you pass a pointer to a function with a void * context parameter, so use that to pass your this pointer to the main thread function:
Having one class per thread helps you isolate that thread, and thus, that thread's data from the outer world: No other thread will be able to access that data as it is private.
Here's some code:
// Some fictious thread API
typedef void (*MainThreadFunction)(void * p_context) ;
ThreadHandle CreateSomeThread(MainThreadFunction p_function, void * p_context) ;
// class header
class MyClass
{
public :
MyClass() ;
// etc.
void run() ;
private :
ThreadHandle m_handle ;
static void threadMainStatic(void * p_context) ;
void threadMain() ;
}
.
// source
void MyClass::run()
{
this->m_handle = CreateSomeThread(&MyClass::threadMainStatic, this) ;
}
void MyClass::threadMainStatic(void * p_context)
{
static_cast<MyClass *>(p_context)->threadMain() ;
}
void MyClass::threadMain()
{
// Do the work
}
Displaimer: This wasn't tested in a compiler. Take it as pseudo C++ code more than actual code. YMMV.
2 - Identify the data that is not shared.
This data can be hidden in the private section of the owning object, and if they are protected by synchronization, then this protection is overkill (as the data is NOT shared)
3 - Identify the data that is shared
... and verify its sychronization (locks, atomic access)
4 - Each class should have its own header and source
... and protect the access to its (shared) data with synchronization, if necessary
5 - Protect the access as much as possible
If one function is used by a class, and only a class, and does not really need access to the class internals, then it could be hidden in an anonymous namespace.
If one variable is owned by only a thread, hide it in the class as a private variable member.
etc.

Related

class without instance data, or namespace + globals in a sub-namespace?

I have a few related methods, all of which use some global data (*), and all of which are to be implemented in a header. Should I...
place the methods in a class as static methods, with no instance data members, and the global variable(s) as a static class members?
place the methods in a namespace, and the global variable(s) in a ::detail sub-namespace?
Both of these options seem a bit ugly to me, but I'm wondering whether one of them has methodical benefits I'm missing, which should make me prefer it. I could try to bend-over-backwards and move the mutex into a .cpp file, so that might be a third option but it'll not be pretty...
(*) - in my specific case it's a mutex.
The only added bonus of using a class here, is that that compiler can complain if anyone ever reaches for that mutex (if you make it private, of course).
But, in both cases, you can only declare the data in the header, and you must still define it only once in some translation unit. I'd recommend you dispense with the in-header implementation, and hide the static mutex along with function implementation in the same translation unit.
From that point on, you can use a namespace, since it becomes a matter of organization only.
From a general "good practice" perspective, I would avoid any global data, and use static data as less as possible ( avoid the singleton syndrome ).
The first question is: Does it hurt to have several copy of those objects? Usually, if you are not managing hardware access or similar, it does not. Even if you are supposed to use only one, it is good to make it as general as possible and allows several instance of the class.
Also, static / global variables tend to carry issues with threads/processes, as it is some times difficult to figure how many copy of the static object you have.
So as a short answer: Put them in a class, as non-static, and put those "global" data into the class also.
If the reason to make those data is due to multiple threads or processes, then probably something is wrong at a design point of view.
If you do not have choice over a more appropriate solution, then I would use the namespace: a class using global parameters would break the encapsulation principle which any programmer would expect from your code.
EDITED:
After the edition in the question and trying to get closer of what the question is:
Supposing you require a wrapper around a mutex, which is unique per process.
class MyMutexWrapper // very minimal class, you should finish it.
{
MyMutex mutex;
public:
anyFunction();
};
Once you have this, you may create a single global instance of it and share it through the whole code. A better solution still to create it in any scope and pass it to any component that make use if it.
You could for example pass a reference.
This approach could seem more complex, but it avoid cross dependencies and make your code structure cleaner => you could then reuse it, extend it, maintain it...

Name of this C++ pattern and the reasoning behind it?

In my company's C++ codebase I see a lot of classes defined like this:
// FooApi.h
class FooApi {
public:
virtual void someFunction() = 0;
virtual void someOtherFunction() = 0;
// etc.
};
// Foo.h
class Foo : public FooApi {
public:
virtual void someFunction();
virtual void someOtherFunction();
};
Foo is this only class that inherits from FooApi and functions that take or return pointers to Foo objects use FooApi * instead. It seems to mainly be used for singleton classes.
Is this a common, named way to write C++ code? And what is the point in it? I don't see how having a separate, pure abstract class that just defines the class's interface is useful.
Edit[0]: Sorry, just to clarify, there is only one class deriving from FooApi and no intention to add others later.
Edit[1]: I understand the point of abstraction and inheritance in general but not this particular usage of inheritance.
The only reason that I can see why they would do this is for encapsulation purposes. The point here is that most other code in the code-base only requires inclusion of the "FooApi.h" / "BarApi.h" / "QuxxApi.h" headers. Only the parts of the code that create Foo objects would actually need to include the "Foo.h" header (and link with the object-file containing the definition of the class' functions). And for singletons, the only place where you would normally create a Foo object is in the "Foo.cpp" file (e.g., as a local static variable within a static member function of the Foo class, or something similar).
This is similar to using forward-declarations to avoid including the header that contains the actual class declaration. But when using forward-declarations, you still need to eventually include the header in order to be able to call any of the member functions. But when using this "abstract + actual" class pattern, you don't even need to include the "Foo.h" header to be able to call the member functions of FooApi.
In other words, this pattern provides very strong encapsulation of the Foo class' implementation (and complete declaration). You get roughly the same benefits as from using the Compiler Firewall idiom. Here is another interesting read on those issues.
I don't know the name of that pattern. It is not very common compared to the other two patterns I just mentioned (compiler firewall and forward declarations). This is probably because this method has quite a bit more run-time overhead than the other two methods.
This is for if the code is later added on to. Lets say NewFoo also extends/implements FooApi. All the current infrastructure will work with both Foo and NewFoo.
It's likely that this has been done for the same reason that pImpl ("pointer to implementation idiom", sometimes called "private implementation idiom") is used - to keep private implementation details out of the header, which means common build systems like make that use file timestamps to trigger code recompilation will not rebuild client code when only implementation has changed. Instead, the object containing the new implementation can be linked against existing client object(s), and indeed if the implementation is distributed in a shared object (aka dynamic link library / DLL) the client application can pick up a changed implementation library the next time it runs (or does a dlopen() or equivalent if it's linking at run-time). As well as facilitating distribution of updated implementation, it can reduce rebuilding times allowing a faster edit/test/edit/... cycle.
The cost of this is that implementations have to be accessed through out-of-line virtual dispatch, so there's a performance hit. This is typically insignificant, but if a trivial function like a get-int-member is called millions of times in a performance critical loop it may be of interest - each call can easily be an order of magnitude slower than inlined member access.
What's the "name" for it? Well, if you say you're using an "interface" most people will get the general idea. That term's a bit vague in C++, as some people use it whenever a base class has virtual methods, others expect that the base will be abstract, lack data members and/or private member functions and/or function definitions (other than the virtual destructor's). Expectations around the term "interface" are sometimes - for better or worse - influenced by Java's language keyword, which restricts the interface class to being abstract, containing no static methods or function definitions, with all functions being public, and only const final data members.
None of the well-known Gang of Four Design Patterns correspond to the usage you cite, and while doubtless lots of people have published (web- or otherwise) corresponding "patterns", they're probably not widely enough used (with the same meaning!) to be less confusing than "interface".
FooApi is a virtual base class, it provides the interface for concrete implementations (Foo).
The point is you can implement functionality in terms of FooApi and create multiple implementations that satisfy its interface and still work with your functionality. You see some advantage when you have multiple descendants - the functionality can work with multiple implementations. One might implement a different type of Foo or for a different platform.
Re-reading my answer, I don't think I should talk about OO ever again.

Class methods VS Class static functions VS Simple functions - Performance-wise?

OK, here's what I want :
I have written several REALLY demanding functions (mostly operating on bitmaps, etc) which have to be as fast as possible
Now, let's also mention that these functions may also be grouped by type, or even by the type of variable on which they operate.
And the thing is, apart from the very implementation of the algorithms, what I should do - from a technical point of view - in order not to mess up the speed.
And now, I'm considering the following scenarios :
Create them as simple functions and just pass the necessary parameters as arguments
Create a class (for 'grouping'/organisation purposes) and just declare them as static
Create class by type, e.g. Create a class for working on bitmaps, create a new instance of that Class for every bitmap (e.g. Bitmap* myBitmap = newBitmap(1010);, and operate on it with its inner methods (e.g. myBitmap->getFirstBitSet())
Now, which of these approaches is the fastest? Is there really any difference between straight simple functions and Class-encapsulated static functions, performance-wise? Any other scenario that would be preferable, which I haven't mentioned?
Sidenote : I'm using the clang++ compiler, for Mac OS X 10.6.8. (if that makes any difference)
At CPU level, there is only one kind of function, and it very much ressemble the C kind. You could craft your own, but...
As it turns out, C++ being built with efficiency in mind maps most functions directly to call instructions:
a namespace level function is like a regular C function
a static method is like a namespace level function (from a call point of view)
a non-static method is very similar to a static method, except an implicit this parameter is passed on top of the other parameters (one pointer)
All those 3 have the exact same kind of performance.
On the other hand, virtual methods have a slight overhead. There was a C++ technical report on performance which estimated the overhead compared to a non-virtual method between 10% and 15% (from memory) for empty functions. Meaning that for any function with meat inside (ie, doing real work), the overhead itself is close to getting lost in the noise. The real cost comes from the inhibition of inlining unless the virtual call can be deduced at compile-time.
There is absolutely no difference between classic old C functions and static methods of classes. The difference is only aesthetic. If you have multiple C functions that have certain relation between them, you can:
group them into a class;
place them into a namespace;
The difference will again be aesthetic. Most likely this will improve readability.
In case if these C functions share some static data, it would make sense (if possible) to define this data as private static data members of a class. In this case variant with the class would be preferable over the variant with namespace.
I would discourage you from creating a dummy instance. This will be misleading to the reader of the source code.
Creating an instance for every bitmap is possible and can even be favorable. Especially if you call methods on this instance several times in a typical scenario.

Best practices for a class with many members

Any opinions on best way to organize members of a class (esp. when there are many) in C++. In particular, a class has lots of user parameters, e.g. a class that optimizes some function and has number of parameters such as # of iterations, size of optimization step, specific method to use, optimization function weights etc etc. I've tried several general approaches and seem to always find something non-ideal with it. Just curious others experiences.
struct within the class
struct outside the class
public member variables
private member variables with Set() & Get() functions
To be more concrete, the code I'm working on tracks objects in a sequence of images. So one important aspect is that it needs to preserve state between frames (why I didn't just make a bunch of functions). Significant member functions include initTrack(), trackFromLastFrame(), isTrackValid(). And there are a bunch of user parameters (e.g. how many points to track per object tracked, how much a point can move between frames, tracking method used etc etc)
If your class is BIG, then your class is BAD.
A class should respect the Single Responsibility Principle , i.e. : A class should do only one thing, but should do it well. (Well "only one" thing is extreme, but it should have only one role, and it has to be implemented clearly).
Then you create classes that you enrich by composition with those single-role little classes, each one having a clear and simple role.
BIG functions and BIG classes are nest for bugs, and misunderstanding, and unwanted side effects, (especially during maintainance), because NO MAN can learn in minutes 700 lines of code.
So the policy for BIG classes is: Refactor, Composition with little classes targetting only at what they have do.
if i had to choose one of the four solutions you listed: private class within a class.
in reality: you probably have duplicate code which should be reused, and your class should be reorganized into smaller, more logical and reusable pieces. as GMan said: refactor your code
First, I'd partition the members into two sets: (1) those that are internal-only use, (2) those that the user will tweak to control the behavior of the class. The first set should just be private member variables.
If the second set is large (or growing and changing because you're still doing active development), then you might put them into a class or struct of their own. Your main class would then have a two methods, GetTrackingParameters and SetTrackingParameters. The constructor would establish the defaults. The user could then call GetTrackingParameters, make changes, and then call SetTrackingParameters. Now, as you add or remove parameters, your interface remains constant.
If the parameters are simple and orthogonal, then they could be wrapped in a struct with well-named public members. If there are constraints that must be enforced, especially combinations, then I'd implement the parameters as a class with getters and setters for each parameter.
ObjectTracker tracker; // invokes constructor which gets default params
TrackerParams params = tracker.GetTrackingParameters();
params.number_of_objects_to_track = 3;
params.other_tracking_option = kHighestPrecision;
tracker.SetTrackingParameters(params);
// Now start tracking.
If you later invent a new parameter, you just need to declare a new member in the TrackerParams and initialize it in ObjectTracker's constructor.
It all depends:
An internal struct would only be useful if you need to organize VERY many items. And if this is the case, you ought to reconsider your design.
An external struct would be useful if it will be shared with other instances of the same or different classes. (A model, or data object class/struct might be a good example)
Is only ever advisable for trivial, throw-away code.
This is the standard way of doing things but it all depends on how you'll be using the class.
Sounds like this could be a job for a template, the way you described the usage.
template class FunctionOptimizer <typename FUNCTION, typename METHOD,
typename PARAMS>
for example, where PARAMS encapsulates simple optimization run parameters (# of iterations etc) and METHOD contains the actual optimization code. FUNCTION describes the base function you are targeting for optimization.
The main point is not that this is the 'best' way to do it, but that if your class is very large there are likely smaller abstractions within it that lend themselves naturally to refactoring into a less monolithic structure.
However you handle this, you don't have to refactor all at once - do it piecewise, starting small, and make sure the code works at every step. You'll be surprised how much better you quickly feel about the code.
I don't see any benefit whatsoever to making a separate structure to hold the parameters. The class is already a struct - if it were appropriate to pass parameters by a struct, it would also be appropriate to make the class members public.
There's a tradeoff between public members and Set/Get functions. Public members are a lot less boilerplate, but they expose the internal workings of the class. If this is going to be called from code that you won't be able to refactor if you refactor the class, you'll almost certainly want to use Get and Set.
Assuming that the configuration options apply only to this class, use private variables that are manipulated by public functions with meaningful function names. SetMaxInteriorAngle() is much better than SetMIA() or SetParameter6(). Having getters and setters allows you to enforce consistency rules on the configuration, and can be used to compensate for certain amounts of change in the configuration interface.
If these are general settings, used by more than one class, then an external class would be best, with private members and appropriate functions.
Public data members are usually a bad idea, since they expose the class's implementation and make it impossible to have any guaranteed relation between them. Walling them off in a separate internal struct doesn't seem useful, although I would group them in the list of data members and set them off with comments.

C++, How to maintain both data locality and well splitted code structure at every layer of a program?

In my recent project I have a class like this:
class layer1 {
myclassa l1dataa; // layer1 data
...
myclassn l1datan;
public:
void l1datatransformsa()
{
myotherclassa l2dataa; // layer2 data
...
myotherclassn l2datan;
many operations; // way too many operations for a single method
}
void l1datatransformsb() {}
};
The method l1datatransformsa invokes local data and is quite long and robust. I would like to divide its code into smaller meaningful portions (methods) which all work on the same local layer2 data. It can be done in few ways, though none of them seems good enough to me, therefore I'm asking for recommendation on how should it be done:
Breaking the code of "many operations" into private methods of class layer1.
Cons: I would have to pass as arguments to those new methods references to all layer2 data, which is not very elegant as there is too many of them
Rewriting the method l1datatransformsa as a nested class of class layer1 with layer2 data declared as its data members. Then it would be possible to split "many operations" into members of the nested class.
Cons: To access layer1 data from nested class I would have to use reference or pointer to the instance of enclosing class. This will make me include many changes in the code of "many operations" and will make the code less clear. It would be even worse if one would think of a need of splitting in the same manner one of methods of nested class.
The basic idea behind all this is to have a comfortable way of keeping your local data close to the functions or methods which use it and only to them at every layer of your program.
ADDED: "many operations" which we we want to split work both on almost all data members of class layer1 and all local data layer2. They work on layer2 data sequentially and that's why they can be splitted easily, though it's a bit awkward 'programistically'.
First of all, you can increase the clarity of your code by defining your class in a header file, using only prototypes for member functions, and writing the member functions in a separate .cpp file. I'm assuming that you combined these for the sake of making it easier to post here.
The method l1datatransformsa invokes
local data and is quite long and
robust. I would like to divide its
code into smaller meaningful portions
(methods) which all work on the same
local layer2 data.
You might be approaching this incorrectly. If you are only wanting to break down a large member function for the sake of sanity, then all you need are functions, not members. Every function associated with a class is not required to be a member. Only use members here if you will need to call these sub-routines explicitly and individually from somewhere other than inside another member function. When you write your helper functions in the same .cpp file as your class' member functions, declare them static and they will only operate within the scope of that file (effectively limiting them to that class but without giving them the unobstructed data access of a member function). This is an easy way to enforce restrictions on data access as well as promote modularity. Each sub-function will only operate on data passed through the function's parameters (as opposed to a member function which can access all of the class' member data freely).
If you find yourself needing to pass a large number of parameters to a single function, ask yourself if you should A) store them in a struct instead of independent variables and pass the struct to the function or B) break apart the function into several shorter, more focused functions that perform their task on a sub-set of the variables. If these are member variables and you still want to access them individually but pack them into a struct, don't forget you can make the struct private and write simple getter/setter functions for accessing the individual values.
Keep the functions focused; each should do a single task, and do it well. Small functions are easier to read, test, and debug. Don't be afraid to break up your code into several nested layers (l1datatransformsa calls helper func A, which calls helper func B, etc) if it makes the code clearer. If you can write a relatively short name for the function that describes clearly and exactly what the function does (encryptString() or verifyChecksums() instead of dataProcessingStepFour()), you are probably on the right track.
TL:DR version: I don't think nesting a second class is the answer here. If, as you say, the nested class will need to access members of the parent class, that throws up a flag in my head that there is a better way to organize this (classes should function independently and should never assume that they are a child of an object of a particular type). Personally, I would keep l1datatransformsa relatively brief and use helper functions (not member functions) to do the work. If you are needing to pass a lot of different variables to helper functions, either use a struct instead of loose variables or re-think whether that sub-function needs all that information or if it can be split into smaller functions that each operate on less data.
I would conceptualize it, then break up data layers based on conceptual actions and models.
-- New answer --
I removed my old answer because I thought you were looking for a trivial tips. I think you need to do some reading on the tools and techniques you have available to organize and construct software.
Gang Of Four - Design Pattern Book
Modern C++ Design
Generic Programming
The first book is essential, the second builds up some of the concepts that are introduced in the first in C++. The third is quite academic -- but contains a wealth of information, you can probably ignore it.