Don't understand this block of code(it runs with no condition) - c++

I'm learning c++ and haven't really seen this in any of the books I've read. I wanted to read and comment code so I can learn better and came across a odd section of code that runs but does not have a condition. From what I read(and my experiences with other languages, you need an if, while,for or something for blocks).
I'm looking at the tbb threads package so I'm not sure if its related to launching threads or c++ specific(if you don't recognize this as something common in c++ then its probably tdd specific).
I think I understand what the code inside actually does but I'm not sure how its being triggered or ran. Any ideas?
Here's the section:
{
//this is the graph part of the code
Graph g;
g.create_random_dag(nodes);
std::vector<Cell*> root_set;
g.get_root_set(root_set);
root_set_size = root_set.size();
for( unsigned int trial=0; trial<traversals; ++trial ) {
ParallelPreorderTraversal(root_set);
}
}
p.s. If it helps here's the entire file(the above code is in the middle of the main()).
#include <cstdlib>
#include "tbb/task_scheduler_init.h"
#include "tbb/tick_count.h"
#include "../../common/utility/utility.h"
#include <iostream>
#include <vector>
#include "Graph.h"
// some forward declarations
class Cell;
void ParallelPreorderTraversal( const std::vector<Cell*>& root_set );
//------------------------------------------------------------------------
// Test driver
//------------------------------------------------------------------------
utility::thread_number_range threads(tbb::task_scheduler_init::default_num_threads);
static unsigned nodes = 1000;
static unsigned traversals = 500;
static bool SilentFlag = false;
//! Parse the command line.
static void ParseCommandLine( int argc, const char* argv[] ) {
utility::parse_cli_arguments(
argc,argv,
utility::cli_argument_pack()
//"-h" option for for displaying help is present implicitly
.positional_arg(threads,"n-of-threads","number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the TBB default.")
.positional_arg(nodes,"n-of-nodes","number of nodes in the graph.")
.positional_arg(traversals,"n-of-traversals","number of times to evaluate the graph. Reduce it (e.g. to 100) to shorten example run time\n")
.arg(SilentFlag,"silent","no output except elapsed time ")
);
}
int main( int argc, const char* argv[] ) {
try {
tbb::tick_count main_start = tbb::tick_count::now(); //tbb counter start
ParseCommandLine(argc,argv);
// Start scheduler with given number of threads.
std::cout << threads << std::endl;
for( int p=threads.first; p<=threads.last; ++p ) {
tbb::tick_count t0 = tbb::tick_count::now(); //timer
tbb::task_scheduler_init init(4); //creates P number of threads
srand(2); //generates a random number between 0-2?
size_t root_set_size = 0;
{
//this is the graph part of the code
Graph g;
g.create_random_dag(nodes);
std::vector<Cell*> root_set;
g.get_root_set(root_set);
root_set_size = root_set.size();
for( unsigned int trial=0; trial<traversals; ++trial ) {
ParallelPreorderTraversal(root_set);
}
}
tbb::tick_count::interval_t interval = tbb::tick_count::now()-t0; //counter done
if (!SilentFlag){ //output the results
std::cout
<<interval.seconds()<<" seconds using "<<p<<" threads ("<<root_set_size<<" nodes in root_set)\n";
}
}
utility::report_elapsed_time((tbb::tick_count::now()-main_start).seconds());
return 0;
}catch(std::exception& e){
std::cerr
<< "unexpected error occurred. \n"
<< "error description: "<<e.what()<<std::endl;
return -1;
}
}

No you don't need an if or while statement to introduce a new level of scope. Basically the { symbol opens a new scope level and } ends it. The usual scoping rules apply, for example, variables defined within this new block are undefined outside of, at the end of the block object destructors are run, and variables named the same as another in a scope level above will be shadowed.
A common use case is in switch statements. For example,
switch (a)
{
case 1:
{
int i;
}
case 2:
{
int i; //note reuse of variable with the same name as in case 1
}
}
Without the { } in the case statements the compiler will complain about multiply defined identifiers.

The pair of { and } are creating a local scope. At the end of the scope the compiler will automatically invoke destructors for all stack variables (if one exists) that were declared within that scope.
In your case, destructors for g and root_set will be called at the end of the scope.
One very common use I can think of is to obtain a mutex lock when working with threads. Let's say you have a class named Lock that accepts a mutex object and acquires a lock on it. Then you can surround a critical section of code that needs to be protected from concurrent access as follows:
{
Lock lock( mutex ); // the Lock constructor will acquire a lock on mutex
// do stuff
} // Here the Lock destructor runs and releases the lock on mutex, allowing
// other threads to acquire a lock
The advantage of doing the above is that even if the code within the { ... } block throws an exception, the compiler still invokes Lock's destructor ensuring that the mutex lock is released.

If you are referring to the fact that the block of code has an extra set of braces, that is not uncommon in C++ programming when dealing with short-lived objects on the stack, in this case the Graph and std::vector<Cell*> objects. A pair of curly braces creates a new scope. They do not have to be attached to any control statements. So in this case, a temporary scope is being used to ensure the Graph and vector objects gets freed quickly when they go out of scope. If the extra braces where not present, the objects would not get freed until the next iteration of the outer for loop.

You can create extra blocks like that. They're used to impose an additional level of scope. In your example, G won't exist before or after that block.

Related

How to use `boost::thread_specific_ptr` with `for_each`

In the code below, Bar is supposed to model a thread-unsafe object that is moderately expensive to create. Foo contains a Bar and is multi-threaded, so it uses a thread_specific_ptr<Bar> to make a per-thread Bar that can be re-used across multiple calls to loop for the same Foo (therefore amortizing the cost of creating a Bar for each thread). Foo always creates a Bar with the same num, so the sanity check is supposed to always pass, yet it fails.
The reason for this is (I think) explained in the requirement for the thread_specific_ptr destructor:
All the thread specific instances associated to this thread_specific_ptr (except maybe the one associated to this thread) must be null.
So the problem is caused by a combination of three things:
Bar objects created in worker threads are not cleaned up when Foos thread_specific_ptr is cleaned up, and are therefore persisted across iterations of the loop in main (essentially, a memory leak)
The C++ runtime is re-using threads in for_each between iterations of the loop in main
The C++ runtime is re-allocating each Foo in the main loop to the same memory address
The way that thread_specific_ptrs are indexed (by the thread_specific_ptr's memory address and the thread ID) results in old Bars being accidentally reused. I understand the problem; what I don't understand is what to do about it. Note the remark from the docs:
The requirement is due to the fact that in order to delete all these instances, the implementation should be forced to maintain a list of all the threads having an associated specific ptr, which is against the goal of thread specific data.
I'd like to avoid this complexity as well.
How can I use for_each for simple thread management, but also avoid the memory leak? Solution requirements:
It should only create one Bar per thread per Foo (i.e., don't create a new Bar inside the for_each)
Assume Bar is not thread-safe.
If possible, use for_each to make the parallel loop as simple as possible
The loop should actually run in parallel (i.e., no mutex around a single Bar)
Bar objects created by loop should be available for use until the Foo object that created them is destructed, at which point all Bar objects should also be destructed.
The following code compiles and should exit with return code 1 with high probability on a machine with sufficient cores.
#include <boost/thread/tss.hpp>
#include <execution>
#include <iostream>
#include <vector>
using namespace std;
class Bar {
public:
// models a thread-unsafe object
explicit Bar(int i) : num(i) { }
int num;
};
class Foo {
public:
explicit Foo(int i) : num(i) { }
void loop() {
vector<int> idxs(32);
iota(begin(idxs), end(idxs), 0);
for_each(__pstl::execution::par, begin(idxs), end(idxs), [&](int) {
if (ptr.get() == nullptr) {
// no `Bar` exists for this thread yet, so create one
Bar *tmp = new Bar(num);
ptr.reset(tmp);
}
// Get the thread-local Bar
Bar &b = *ptr;
// Sanity check: we ALWAYS create a `Bar` with the same num as `Foo`;
// see the `if` block above.
// Therefore, this condition shouldn't ever be true (but it is!)
if (b.num != num) {
cout << "NOT THREAD SAFE: Foo index is " << num << ", but Bar index is " << b.num << endl;
exit(1);
}
});
}
boost::thread_specific_ptr<Bar> ptr;
int num;
};
int main() {
for(int i = 0; i < 100; i++) {
Foo f(i);
f.loop();
}
return 0;
}
According to the documentation
~thread_specific_ptr();
Requires:
All the thread specific instances associated to this thread_specific_ptr
(except maybe the one associated to this thread) must be null.
This means you are not allowed to destroy Foo until all of its Bar have been destroyed. This is a problem because execution_policy::par does not have to operate on a fresh thread pool, nor does it have to terminate the threads once the for_each() is done.
This gives us enough to answer the question as asked: You can only use thread_specific_ptr alongside execution::par to share data between various iterations on the same thread if:
The thread_specific_ptr is never destroyed. This is required because there is no way to know whether a given iteration of the for_each will be the last one for its assigned thread, and that thread might never get scheduled again.
You are comfortable leaking one instance of the pointed object per thread until the end of the program.
What's going on in your code
We are already in Undefined Behavior land, but the behavior you are seeing can still be explained a bit further. Considering that:
Boost.Thread uses the address of the thread_specific_ptr instance as key of the thread specific pointers. This avoids to create/destroy a key which will need a lock to protect from race conditions. This has a little performance liability, as the access must be done using an associative container.
... and that all 100 instances of Foo will most likely be at the same place in memory, you end up seeing instances of Bar from the previous Foo when the worker threads are recycled, leading to your (innacurate, see below) check to hit.
Solution: What I think you should do
I would suggest you just drop thread_specific_ptr altogether and manually manage the pool of per-thread/per-Foo Bar instances with an associative container, this makes managing the lifetime of the Bar objects a lot more straightforward:
class per_thread_bar_pool {
std::map<std::thread::id, Bar> bars_;
// alternatively:
// std::map<std::thread::id, std::unique_ptr<Bar>> bars_;
std::mutex mtx_;
public:
Bar& get(int num) {
auto tid = std::this_thread::get_id();
std::unique_lock l{mtx_};
auto found = bars_.find(tid);
if(found == bars_.end()) {
l.unlock(); // Let other threads access the map while `Bar` is being built.
Bar new_bar(num);
// auto new_bar = std::make_unique<Bar>(num);
l.lock();
assert(bars_.find(tid) == bars_.end());
found = bars_.emplace(tid, std::move(new_bar)).first;
}
return found->second;
// return *found->second;
}
};
void loop() {
per_thread_bar_pool bars;
vector<int> idxs(32);
iota(begin(idxs), end(idxs), 0);
for_each(__pstl::execution::par, begin(idxs), end(idxs), [&](int) {
Bar& current_bar = bars.get(num);
// ...
}
}
thread_specific_ptr already uses std::map<> under the hood (it maintains one per thread). So introducing one here is not that big of a deal.
We do introduce a mutex, but it only comes into play for a simple lookup/insertion into a map, and since constructing Bar is supposed to be so expensive, it will most likely have very little impact. It also has the benefit that multiple instances of Foo do not interact with each other anymore, so you avoid surprising bugs that could occur if you ever end up calling foo::loop() from multiple threads.
N.B.: if (b.num != num) { is not a valid test since all instances of Bar from a given Foo share the same num. That should only cause false-negatives though.
Solution: Making your code work (almost)
All this being said, if you are absolutely gung-ho about using thread_specific_pointer and execution::par at the same time you'll have to do the following:
void loop() {
static boost::thread_specific_ptr<Bar> ptr; // lives till the end of the program
vector<int> idxs(32);
iota(begin(idxs), end(idxs), 0);
for_each(__pstl::execution::par, begin(idxs), end(idxs), [&](int) {
if (ptr.get() == nullptr || ptr->num != num) {
// no `Bar` exists for this thread yet, or it's from a previous run
Bar *tmp = new Bar(num);
ptr.reset(tmp);
}
// Get the thread-local Bar
Bar &b = *ptr;
});
However, this will leak up to 1 Bar per thread, as cleanup only ever happens when we try to reuse a Bar from a previous run. There are no ways around this.

Why the following program does not mix the output when mutex is not used?

I have made multiple runs of the program. I do not see that the output is incorrect, even though I do not use the mutex. My goal is to demonstrate the need of a mutex. My thinking is that different threads with different "num" values will be mixed.
Is it because the objects are different?
using VecI = std::vector<int>;
class UseMutexInClassMethod {
mutex m;
public:
VecI compute(int num, VecI veci)
{
VecI v;
num = 2 * num -1;
for (auto &x:veci) {
v.emplace_back(pow(x,num));
std::this_thread::sleep_for(std::chrono::seconds(1));
}
return v;
}
};
void TestUseMutexInClassMethodUsingAsync()
{
const int nthreads = 5;
UseMutexInClassMethod useMutexInClassMethod;
VecI vec{ 1,2,3,4,5 };
std::vector<std::future<VecI>> futures(nthreads);
std::vector<VecI> outputs(nthreads);
for (decltype(futures)::size_type i = 0; i < nthreads; ++i) {
futures[i] = std::async(&UseMutexInClassMethod::compute,
&useMutexInClassMethod,
i,vec
);
}
for (decltype(futures)::size_type i = 0; i < nthreads; ++i) {
outputs[i] = futures[i].get();
for (auto& x : outputs[i])
cout << x << " ";
cout << endl;
}
}
If you want an example that does fail with a high degree of certainty you can look at the below. It sets up a variable called accumulator to be shared by reference to all the futures. This is what is missing in your example. You are not actually sharing any memory. Make sure you understand the difference between passing by reference and passing by value.
#include <vector>
#include <memory>
#include <thread>
#include <future>
#include <iostream>
#include <cmath>
#include <mutex>
struct UseMutex{
int compute(std::mutex & m, int & num)
{
for(size_t j = 0;j<1000;j++)
{
///////////////////////
// CRITICAL SECTIION //
///////////////////////
// this code currently doesn't trigger the exception
// because of the lock on the mutex. If you comment
// out the single line below then the exception *may*
// get called.
std::scoped_lock lock{m};
num++;
std::this_thread::sleep_for(std::chrono::nanoseconds(1));
num++;
if(num%2!=0)
throw std::runtime_error("bad things happened");
}
return 0;
}
};
template <typename T> struct F;
void TestUseMutexInClassMethodUsingAsync()
{
const int nthreads = 16;
int accumulator=0;
std::mutex m;
std::vector<UseMutex> vs{nthreads};
std::vector<std::future<int>> futures(nthreads);
for (auto i = 0; i < nthreads; ++i) {
futures[i]= std::async([&,i](){return vs[i].compute(m,accumulator);});
}
for(auto i = 0; i < nthreads; ++i){
futures[i].get();
}
}
int main(){
TestUseMutexInClassMethodUsingAsync();
}
You can comment / uncomment the line
std::scoped_lock lock{m};
which protects the increment of the shared variable num. The rule for this mini program is that at the line
if(num%2!=0)
throw std::runtime_error("bad things happened");
num should be a multiple of two. But as multiple threads are accessing this variable without a lock you can't guarantee this. However if you add a lock around the double increment and test then you can be sure no other thread is accessing this memory during the duration of the increment and test.
Failing
https://godbolt.org/z/sojcs1WK9
Passing
https://godbolt.org/z/sGdx3x3q3
Of course the failing one is not guaranteed to fail but I've set it up so that it has a high probability of failing.
Notes
[&,i](){return vs[i].compute(m,accumulator);};
is a lambda or inline function. The notation [&,i] means it captures everything by reference except i which it captures by value. This is important because i changes on each loop iteration and we want each future to get a unique value of i
Is it because the objects are different?
Yes.
Your code is actually perfectly thread safe, no need for mutex here. You never share any state between threads except for copying vec from TestUseMutexInClassMethodUsingAsync to compute by std::async (and copying is thread-safe) and moving computation result from compute's return value to futures[i].get()'s return value. .get() is also thread-safe: it blocks until the compute() method terminates and then returns its computation result.
It's actually nice to see that even a deliberate attempt to get a race condition failed :)
You probably have to fully redo your example to demonstrate is how simultaneous* access to a shared object breaks things. Get rid of std::async and std::future, use simple std::thread with capture-by-reference, remove sleep_for (so both threads do a lot of operations instead of one per second), significantly increase number of operations and you will get a visible race. It may look like a crash, though.
* - yes, I'm aware that "wall-clock simulateneous access" does not exist in multithreaded systems, strictly speaking. However, it helps getting a rough idea of where to look for visible race conditions for demonstration purposes.
Comments have called out the fact that just not protecting a critical section does not guarantee that the risked behavior actually occurs.
That also applies for multiple runs, because while you are not allowed to test a few times and then rely on the repeatedly observed behavior, it is likely that optimization mechanisms cause a likely enough reoccurring observation as to be perceived has reproducible.
If you intend to demonstrate the need for synchronization you need to employ synchronization to poise things to a near guaranteed misbehavior of observable lack of protection.
Allow me to only outline a sequence for that, with a few assumptions on scheduling mechanisms (this is based on a rather simple, single core, priority based scheduling environment I have encountered in an embedded environment I was using professionally), just to give an insight with a simplified example:
start a lower priority context.
optionally set up proper protection before entering the critical section
start critical section, e.g. by outputting the first half of to-be-continuous output
asynchronously trigger a higher priority context, which is doing that which can violate your critical section, e.g. outputs something which should not be in the middle of the two-part output of the critical section
(in protected case the other context is not executed, in spite of being higher priority)
(in unprotected case the other context is now executed, because of being higher priority)
end critical section, e.g. by outputting the second half of the to-be-continuous output
optionally remove the protection after leaving the critical section
(in protected case the other context is now executed, now that it is allowed)
(in unprotected case the other context was already executed)
Note:
I am using the term "critical section" with the meaning of a piece of code which is vulnerable to being interrupted/preempted/descheduled by another piece of code or another execution of the same code. Specifically for me a critical section can exist without applied protection, though that is not a good thing. I state this explicitly because I am aware of the term being used with the meaning "piece of code inside applied protection/synchronization". I disagree but I accept that the term is used differently and requires clarification in case of potential conflicts.

C++ syntax I don't understand

I've found a C++ code that has this syntax:
void MyClass::method()
{
beginResetModel();
{
// Various stuff
}
endResetModel();
}
I've no idea why there are { } after a line ending with ; but it seems there is no problem to make it compile and run. Is it possible this as something to do with the fact that the code may be asynchronous (I'm not sure yet)? Or maybe the { } are only here to delimit a part of the code and don't really make a difference but honestly I doubt this. I don't know, does someone has any clue what this syntax mean ?
More info: There is no other reference to beginResetModel, resetModel or ResetModel in the whole project (searched with grep). Btw the project is a Qt one. Maybe it's another Qt-related macro I haven't heard of.
Using {} will create a new scope. In your case, any variable created in those braces will cease to exist at the } in the end.
beginResetModel();
{
// Various stuff
}
endResetModel()
The open and close braces in your code are a very important feature in C++, as they delimit a new scope. You can appreciate the power of this in combination with another powerful language feature: destructors.
So, suppose that inside those braces you have code that creates various objects, like graphics models, or whatever.
Assuming that these objects are instances of classes that allocate resources (e.g. textures on the video card), and those classes have destructors that release the allocated resources, you are guaranteed that, at the }, these destructors are automatically invoked.
In this way, all the allocated resources are automatically released, before the code outside the closing curly brace, e.g. before the call to endResetModel() in your sample.
This automatic and deterministic resource management is a key powerful feature of C++.
Now, suppose that you remove the curly braces, and your method looks like this:
void MyClass::method()
{
beginResetModel();
// {
// Various stuff
// }
endResetModel();
}
Now, all the objects created in the Various stuff section of code will be destroyed before the } that terminates the MyClass::method(), but after the call to endResetModel().
So, in this case, you end up with the endResetModel() call, followed by other release code that runs after it. This may cause bugs.
On the other hand, the curly braces that define a new scope enclosed in begin/endResetModel() do guarantee that all the objects created inside this scope are destroyed before endResetModel() is invoked.
{} delimits a scope. That means that any variable declared inside there is not accessible outside of it and is erased from memory once the } is reached. Here is an example:
#include <iostream>
using namespace std;
class MyClass{
public:
~MyClass(){
cout << "Destructor called" << endl;
}
};
int main(){
{
int x = 3;
MyClass foo;
cout << x << endl; //Prints 3
} //Here "Destructor called" is printed since foo is cleared from the memory
cout << x << endl; //Compiler error, x isn't defined here
return 0;
}
Usually scopes are used for functions, loops, if-statements, etc, but you're perfectly allowed to use scopes without any statement before them. This can be particularly useful to declare variables inside a switch (this answer explains why).
As others have pointed out, the curly braces create a new scope, but maybe the interesting thing is why would you want to do that - that is, what is the difference between using it and not using it. There are cases where scopes are obviously necessary, such as with if or for blocks; if you don't create a scope after them you can only have one statement. Another possible reason is that maybe you use one variable in one part of the function and do not one it to be used outside of that part, so you put it into its own scope. However, the main use of scopes out of control statements has to do with RAII. When you declare an instance variable (not a pointer or reference), it is always initialized; when it goes out of scope, it is always destroyed. This can be used to define blocks that require some setup at the beginning and some tear down at the end (if you are familiar with Python, similar to with blocks).
Take this example:
#include <mutex>
void fun(std::mutex & mutex) {
// 1. Perform some computations...
{
std::lock_guard<std::mutex> lock(mutex);
// 2. Operations in this scope are performed with the mutex locked
}
// 3. More computations...
}
In this example, part 2 is only run after the mutex has been acquired, and is released before part 3 starts. If you remove the additional scope:
#include <mutex>
void fun(std::mutex & mutex) {
// 1. Perform some computations...
std::lock_guard<std::mutex> lock(mutex);
// 2. Operations in this scope are performed with the mutex locked
// 3. More computations...
}
In this case the mutex is acquired before starting part 2, but it is held until part 3 is complete (possibly producing more interlocking between threads than necessary). Note, however, that in both cases there was no need to specify when the lock is released; std::lock_guard is responsible for both acquiring the lock on construction and releasing it on destruction (i.e. when it goes out of scope).

Boost::Mutex in class not thread-safe

I'm learning concurrent programming and what I want to do is have a class where each object it responsible for running its own Boost:Thread. I'm a little over my head with this code because it uses A LOT of functionality that I'm not that comfortable with (dynamically allocated memory, function pointers, concurrency etc etc). It's like every line of code I had to check some references to get it right.
(Yes, all allocated memory is accounted for in the real code!)
I'm having trouble with the mutexes. I declare it static and it seems to get the same value for all the instances (as it should). The code is STILL not thread safe.
The mutex should stop the the threads (right?) from progressing any further in case someone else locked it. Because mutexes are scoped (kind of a neat functionality) and it's within the if statement that should look the other threads out no? Still I get console out puts that clearly suggests it is not thread safe.
Also I'm not sure I'm using the static vaiable right. I tried different ways of refering to it (Seller::ticketSaleMutex) but the only thing that worked was "this->ticketSaleMutex" which seems very shady and it seems to defeat the purpose of it being static.
Seller.h:
class Seller
{
public:
//Some vaiables
private:
//Other variables
static boost::mutex ticketSaleMutex; //Mutex definition
};
Seller.cpp:
boost::mutex Seller::ticketSaleMutex; //Mutex declaration
void Seller::StartTicketSale()
{
ticketSale = new boost::thread(boost::bind(&Seller::SellTickets, this));
}
void Seller::SellTickets()
{
while (*totalSoldTickets < totalNumTickets)
{
if ([Some time tick])
{
boost::mutex::scoped_lock(this->ticketSaleMutex);
(*totalSoldTickets)++;
std::cout << "Seller " << ID << " sold ticket " << *totalSoldTickets << std::endl;
}
}
}
main.cpp:
int main(int argc, char**argv)
{
std::vector<Seller*> seller;
const int numSellers = 10;
int numTickets = 40;
int *soldTickets = new int;
*soldTickets = 0;
for (int i = 0; i < numSellers; i++)
{
seller.push_back(new Seller(i, numTickets, soldTickets));
seller[i]->StartTicketSale();
}
}
This will create a temporary that is immediately destroyed:
boost::mutex::scoped_lock(this->ticketSaleMutex);
resulting in no synchronization. You need to declare a variable:
boost::mutex::scoped_lock local_lock(this->ticketSaleMutex);

Safe multi-thread counter increment

For example, I've got a some work that is computed simultaneously by multiple threads.
For demonstration purposes the work is performed inside a while loop. In a single iteration each thread performs its own portion of the work, before the next iteration begins a counter should be incremented once.
My problem is that the counter is updated by each thread.
As this seems like a relatively simple thing to want to do, I presume there is a 'best practice' or common way to go about it?
Here is some sample code to illustrate the issue and help the discussion along.
(Im using boost threads)
class someTask {
public:
int mCounter; //initialized to 0
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
int getCount()
{
boost::mutex::scoped_lock lock( cntmutex );
return mCount;
}
void process( int thread_id, int numThreads )
{
while ( getCount() < mTotal )
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
// Wait for all thread to get to this point
cntmutex.lock();
mCounter++; // < ---- how to ensure this is only updated once?
cntmutex.unlock();
}
}
};
The main problem I see here is that you reason at a too-low level. Therefore, I am going to present an alternative solution based on the new C++11 thread API.
The main idea is that you essentially have a schedule -> dispatch -> do -> collect -> loop routine. In your example you try to reason about all this within the do phase which is quite hard. Your pattern can be much more easily expressed using the opposite approach.
First we isolate the work to be done in its own routine:
void process_thread(size_t id, size_t numThreads) {
// do something
}
Now, we can easily invoke this routine:
#include <future>
#include <thread>
#include <vector>
void process(size_t const total, size_t const numThreads) {
for (size_t count = 0; count != total; ++count) {
std::vector< std::future<void> > results;
// Create all threads, launch the work!
for (size_t id = 0; id != numThreads; ++id) {
results.push_back(std::async(process_thread, id, numThreads));
}
// The destruction of `std::future`
// requires waiting for the task to complete (*)
}
}
(*) See this question.
You can read more about std::async here, and a short introduction is offered here (they appear to be somewhat contradictory on the effect of the launch policy, oh well). It is simpler here to let the implementation decides whether or not to create OS threads: it can adapt depending on the number of available cores.
Note how the code is simplified by removing shared state. Because the threads share nothing, we no longer have to worry about synchronization explicitly!
You protected the counter with a mutex, ensuring that no two threads can access the counter at the same time. Your other option would be using Boost::atomic, c++11 atomic operations or platform-specific atomic operations.
However, your code seems to access mCounter without holding the mutex:
while ( mCounter < mTotal )
That's a problem. You need to hold the mutex to access the shared state.
You may prefer to use this idiom:
Acquire lock.
Do tests and other things to decide whether we need to do work or not.
Adjust accounting to reflect the work we've decided to do.
Release lock. Do work. Acquire lock.
Adjust accounting to reflect the work we've done.
Loop back to step 2 unless we're totally done.
Release lock.
You need to use a message-passing solution. This is more easily enabled by libraries like TBB or PPL. PPL is included for free in Visual Studio 2010 and above, and TBB can be downloaded for free under a FOSS licence from Intel.
concurrent_queue<unsigned int> done;
std::vector<Work> work;
// fill work here
parallel_for(0, work.size(), [&](unsigned int i) {
processWorkItem(work[i]);
done.push(i);
});
It's lockless and you can have an external thread monitor the done variable to see how much, and what, has been completed.
I would like to disagree with David on doing multiple lock acquisitions to do the work.
Mutexes are expensive and with more threads contending for a mutex , it basically falls back to a system call , which results in user space to kernel space context switch along with the with the caller Thread(/s) forced to sleep :Thus a lot of overheads.
So If you are using a multiprocessor system , I would strongly recommend using spin locks instead [1].
So what i would do is :
=> Get rid of the scoped lock acquisition to check the condition.
=> Make your counter volatile to support above
=> In the while loop do the condition check again after acquiring the lock.
class someTask {
public:
volatile int mCounter; //initialized to 0 : Make your counter Volatile
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
void process( int thread_id, int numThreads )
{
while ( mCounter < mTotal ) //compare without acquiring lock
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
cntmutex.lock();
//Now compare again to make sure that the condition still holds
//This would save all those acquisitions and lock release we did just to
//check whther the condition was true.
if(mCounter < mTotal)
{
mCounter++;
}
cntmutex.unlock();
}
}
};
[1]http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock