parallelization with openMP - stack or heap variables - c++

I have one working solution for parallelization . However, execution time is very very slightly improved by parallelization. I thinks it comes from the fact I new and delete some variable in the loop. I would like it to be stack created, BUT Command class is abstract, and must remain abstract. What can I do to work around that? How to improve time spent on these very long loops???
#pragma omp parallel for reduction(+:functionEvaluation)
for (int i=rowStart;i<rowEnd+1;i++)
{
Model model_(varModel_);
model_.addVariable("i", i);
model_.addVariable("j", 1);
Command* command_ = formulaCommand->duplicate(&model_);
functionEvaluation += command_->execute().toDouble();
delete command_;
}
The problem may also lie elsewhere! advice welcome!!
thanks and regards.

You may want to play with the private or firstprivate clauses.
Your #pragma would include ...private(varModel, formulaCommand)... or similar, and then each thread would have its own copy of those variables. Using firstprivate will ensure that the thread-specific variable has the initial value copied in, instead of being uninitialized. This would remove the need to new and delete, assuming you can just modify instances for each loop iteration.
This may or may not work as desired, as you haven't provided a lot of detail.

I think you should try to use a mechanism to reuse allocated memory. You probably don't know the size nor the alignment of the Command object coming, so a "big enough" buffer will not suffice. I'd make your duplicate method take two arguments, the second being the reference to a boost::pool. If the pool object is big enough just construct the new Command object inside it, if it's not expand it, and construct into it. boost::pool will handle alignment issues for you, so you don't have to think about it. This way, you'll have to do dynamic memory allocation only a few times per thread.
By the way, it's in general not good practice to return raw pointers in C++. Use smart pointers instead, it's simply better that way without any buts... Well, there's a but in this case :), since with my suggestion you'd be doing some under the hood custom memory management. Still, the bestest practice would be to write a custom smart pointer which handles your special case gracefully, without risking the user to mess up. You could of course do like everyone else and make en exception in this case :) (My advice still holds under normal circumstances though, f.x. in the question above, you should normally use something like boost::scoped_ptr)

Related

Reuse object vs. creating new object

One of our projects deals with tons of data. It selects data from an database and serializes the results into JSON/XML.
Sometimes the amount of selected rows can reach the 50 million mark easily.
However though, the runtime of the program was to bad in the beginning.
So we have refactored the program with one major adjustment:
The working objects for serialization wouldn't be recreated for every single row, instead the object will be cleared and reinitialized.
For example:
Before:
For every single database row we create an object of DatabaseRowSerializer and call the specific serialize function.
// Loop with all dbRows
{
DatabaseRowSerializer serializer(dbRow);
result.add(serializer.toXml());
}
After:
The constructor of DatabaseRowSerializer doesn't sets the dbRow. Instead this will be done by the initDbRow()-function.
The main thing here is, that only one object will be used for the whole runtime. After the serialization of an dbRow, the clear()-function
will be called to reset the object.
DatabaseRowSerializer serializer;
// Loop with all dbRows
{
serializier.initDbRow(dbRow);
result.add(serializer.toXml());
serializier.clear();
}
So my question:
Is this really a good way to handle the problem?
In my opinion init()-functions aren't really smart. And normally a constructor should be used to initialize the possible parameters.
Which way do you generally prefer? Before or after?
On the one hand, this is subjective. On the other, opinion widely agrees that in C++ you should avoid this "init function" idiom because:
It is worse code
You have to remember to "initialise" your object and, if you don't, what state is it in? Your object should never be in a "dead" state. (Don't get me started on "moved-from" objects…) This is why C++ introduced constructors and destructors, because the old C approach was kind of minging and resulting programs are harder to prove correct.
It is unnecessary
There is essentially no overhead in creating a DatabaseRowSerializer every time, unless its constructor does more than your initDbRow function, in which case your two examples are not equivalent anyway.
Even if your compiler doesn't optimise away the unnecessary "allocation", there isn't really an allocation anyway because the object just takes up space on the stack and it has to do that regardless.
So if this change really solved your performance problem, something else was probably going on.
Use your constructors and destructors. Freely and proudly!
That's the common advice when writing C++.
A possible third approach if you did want to make the serializer re-usable for whatever reason, is to move all of its state into the actual operational function call:
DatabaseRowSerializer serializer;
// loop with all dbRows
{
result.add(serializer.toXml(dbRow));
}
You might do this if the serialiser has some desire to cache information, or re-use dynamically-allocated buffers, to aid in performance. That of course adds some state into the serialiser.
If you do this and still don't have any state, then the whole thing can just be a static call:
// loop with all dbRows
{
result.add(DatabaseRowSerializer::toXml(dbRow));
}
…but then it may as well just be a function.
Ultimately we can't know exactly what's best for you, but there are plenty of options and considerations.
Generally I agree with the points raised by LRiO in the other answer.
Just moving the constructor out of the loop isn't a good idea.
However, for this style of loop body:
feed object some data
transform data within object
return transformed data from object
It is, IMHO, often the case that the transforming object will allocate some buffers (on the heap) that potentially can be reused when the second form with the init function is used. In naive implementations, this reuse may not even be deliberate, just a side effect of the implementation.
So, IFF you're seeing a speed up by your refactoring (hoisting the object constructor out of the loop), it may be because the object is now able to re-use some buffers and avoid repeated "redundant" heap allocations for these buffers.
So, in summary:
You do not want the constructor to be hoisted out of the loop for its own sake. But you want all buffers that can be preserved to be preserved across the loop iterations.

Best practice when calling initialize functions multiple times?

This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).
You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).
This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.
I'd look at using boost or STL smart pointers.
I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)
The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).
I think that this could be a scenario where the Singleton pattern could be applied.

Creating a scoped custom memory pool/allocator?

Would it be possible in C++ to create a custom allocator that works simply like this:
{
// Limit memory to 1024 KB
ScopedMemoryPool memoryPool(1024 * 1024);
// From here on all heap allocations ('new', 'malloc', ...) take memory from the pool.
// If the pool is depleted these calls result in an exception being thrown.
// Examples:
std::vector<int> integers(10);
int a * = new int [10];
}
I couldn't find something like this in the boost libraries, or anywhere else.
Is there a fundamental problem that makes this impossible?
You would need to create a custom allocator that you pass in as a template param to vector. This custom allocator would essentially wrap the access to your pool and do whatever size validations that it wants.
Yes you can make such a construct, it's used in many games, but you'll basically need to implement your own containers and call memory allocation methods of that pool that you've created.
You could also experiment with writing a custom allocator for the STL containers, although it seems that that sort of work is generally advised against. (I've done it before and it was tedious, but I don't remember any specific problems.)
Mind- writing your own memory allocator is not for the faint of heart. You could take a look at Doug Lea's malloc, which provides "memory spaces", which you could use in your scoping construct somehow.
I will answer a different question. Look at 'efficient c++' book. One of the things they discuss is implementing this kind of thing. That was for a web server
For this particular thing you can either mess at the c++ layer by overriding new and supplying custom allocators to the STL.
Or you can mess at the malloc level, start with a custom malloc and work from there (like dmalloc)
Is there a fundamental problem that makes this impossible?
Arguing about program behavior would become fundamentally impossible. All sorts of weird issues will come up. Certain sections of the code may or may not execute though this will seeminly have no effect on the next sections which may work un-hindered. Certain sections may always fail. Dealing with the standard-library or any other third party library will become extremely difficult. There may be fragmentations at run-time at times and at times not.
If intent is that all allocations within that scope occur with that allocator object, then it's essentially a thread-local variable.
So, there will be multithreading issues if you use a static or global variable to implement it. Otherwise, not a bad workaround for the statelessness of allocators.
(Of course, you'll need to pass a second template argument eg vector< int, UseScopedPool >.)

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.
My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.
If you want to know whether or not the use count is 1, use the unique() member function.
I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.
I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.
I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

How to measure memory used in a block or program with C++

What is the best way to measure the memory used by a C++ program or a block in a C++ program. The measurement code should thereby be part of the code and it should not be measured from outside. I know of the difficulty of that task, so it does not have to be 100% accurate but at least give me a good impression of the memory usage.
Measuring at the block level will be difficult (at best) unless you're willing to explicitly add instrumentation directly to the code under test.
I wouldn't start with overloads of new and delete at the class level to try to do this. Instead, I'd use overloads of ::operator new and ::operator delete. That's basically the tip of the funnel (so to speak) -- all the other dynamic memory management eventually comes down to calling those (and most do so fairly directly). As such, they will generally do the most to tell you about dynamic memory usage of the program as a whole.
The main time you'd need to deal with overloads of new and delete for an individual class would be if they're already overloaded so they're managing a separate pool, and you care about how much of that pool is in use at a given time. In that case, you'd (just about) need to add instrumentation directly to them, to get something like a high-water mark on their memory usage during a given interval.
Overloading new and delete is probably the way to go. Not only on a specific class but maybe more general. But you also need to mark where the new and delete is made. And have some kind of start/reset or something to mark now you enter this block, and now you exit the block. And then you need to keep the history to be able to monitor it after it happened.
Class level override of the new and delete operators are what you want.
As others have pointed out, you can overload new and delete to measure how much heap was allocated. To do the same for the stack, if you feel adventurous, you'll have to do some ASM. In GCC in x86-64 to get the stack position:
int64_t x = 0;
asm("movq %%rsp, %0;" : "=r" (x) );
This will put in x the address of the stack pointer. Put this in a few places around your code and compare the values before/after entering the block.
Please note that this might need some work to get what you want because of how/when the compiler might allocate memory; it is not as intuitive or trivial as it sounds.