Can someone give me a TBB example how to:
set the maximum count of active threads.
execute tasks that are independent from each others and presented in the form of class, not static functions.
Here's a couple of complete examples, one using parallel_for, the other using parallel_for_each.
Update 2014-04-12: These show what I'd consider to be a pretty old fashioned way of using TBB now; I've added a separate answer using parallel_for with a C++11 lambda.
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (int i=0;i<1000000;++i) {} // Deliberately run slow
std::cerr << "[" << _n << "]";
}
size_t _n;
};
struct executor
{
executor(std::vector<mytask>& t)
:_tasks(t)
{}
executor(executor& e,tbb::split)
:_tasks(e._tasks)
{}
void operator()(const tbb::blocked_range<size_t>& r) const {
for (size_t i=r.begin();i!=r.end();++i)
_tasks[i]();
}
std::vector<mytask>& _tasks;
};
int main(int,char**) {
tbb::task_scheduler_init init; // Automatic number of threads
// tbb::task_scheduler_init init(2); // Explicit number of threads
std::vector<mytask> tasks;
for (int i=0;i<1000;++i)
tasks.push_back(mytask(i));
executor exec(tasks);
tbb::parallel_for(tbb::blocked_range<size_t>(0,tasks.size()),exec);
std::cerr << std::endl;
return 0;
}
and
#include "tbb/parallel_for_each.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (int i=0;i<1000000;++i) {} // Deliberately run slow
std::cerr << "[" << _n << "]";
}
size_t _n;
};
template <typename T> struct invoker {
void operator()(T& it) const {it();}
};
int main(int,char**) {
tbb::task_scheduler_init init; // Automatic number of threads
// tbb::task_scheduler_init init(4); // Explicit number of threads
std::vector<mytask> tasks;
for (int i=0;i<1000;++i)
tasks.push_back(mytask(i));
tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());
std::cerr << std::endl;
return 0;
}
Both compile on a Debian/Wheezy (g++ 4.7) system with g++ tbb_example.cpp -ltbb (then run with ./a.out)
(See this question for replacing that "invoker" thing with a std::mem_fun_ref or boost::bind).
Here's a more modern use of parallel_for with a lambda; compiles and runs on Debian/Wheezy with g++ -std=c++11 tbb_example.cpp -ltbb && ./a.out:
#include "tbb/parallel_for.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (int i=0;i<1000000;++i) {} // Deliberately run slow
std::cerr << "[" << _n << "]";
}
size_t _n;
};
int main(int,char**) {
//tbb::task_scheduler_init init; // Automatic number of threads
tbb::task_scheduler_init init(tbb::task_scheduler_init::default_num_threads()); // Explicit number of threads
std::vector<mytask> tasks;
for (int i=0;i<1000;++i)
tasks.push_back(mytask(i));
tbb::parallel_for(
tbb::blocked_range<size_t>(0,tasks.size()),
[&tasks](const tbb::blocked_range<size_t>& r) {
for (size_t i=r.begin();i<r.end();++i) tasks[i]();
}
);
std::cerr << std::endl;
return 0;
}
If you just want to run a couple of tasks concurrently, it might be easier to just use a tbb::task_group. Example taken from tbb:
#include "tbb/task_group.h"
using namespace tbb;
int Fib(int n) {
if( n<2 ) {
return n;
} else {
int x, y;
task_group g;
g.run([&]{x=Fib(n-1);}); // spawn a task
g.run([&]{y=Fib(n-2);}); // spawn another task
g.wait(); // wait for both tasks to complete
return x+y;
}
}
Note however that
Creating a large number of tasks for a single task_group is not scalable, because task creation becomes a serial bottleneck.
In those cases, use timday's examples with a parallel_for or alike.
1-
//!
//! Get the default number of threads
//!
int nDefThreads = tbb::task_scheduler_init::default_num_threads();
//!
//! Init the task scheduler with the wanted number of threads
//!
tbb::task_scheduler_init init(nDefThreads);
2-
Maybe if your code permits, the best way to run independent task with TBB is the parallel_invoke. In the blog of intel developers zone there is a post explaining some cases of how helpfull parallel_invoke could be. Check out this
Related
I am currently practicing the use of multiple threads in C++. The program is simplified as follow. In this case, I have a global variable Obj, and within each task, a get function is processed by thread and thread detach will be called after.
In practice, get may take a great amount of time to run. If there are many tasks, get will be called repetitively (since each task has its own get function). I wonder if I can design a program where when one task has already obtained the data using get function and the data has been wrote to obj.text, then the rest of tasks can directly access or wait for the data from obj.text.
Can I use std::shared_ptr, std::future, std::async in c++ to implement this? If so, how to design the program? Any advice is greatly appreciated.
#include <chrono>
#include <future>
#include <iostream>
#include <memory>
#include <thread>
#include <vector>
using namespace std;
class Info {
public:
Info() { Ids = 10; };
int Ids;
std::string text;
};
Info Objs;
class Module {
public:
Module() {}
virtual void check(int &id){};
virtual void get(){};
};
class task1 : public Module {
public:
task1() { std::cout << "task1" << std::endl; }
void check(int &id) override {
thread s(&task1::get, this);
s.detach();
};
// The function will first do some other work (here, I use sleep to represent
// that) then set the value of Objs.text
void get() override {
// The task may take 2 seconds , So use text instead
std::this_thread::sleep_for(std::chrono::seconds(5));
Objs.text = "AAAA";
std::cout << Objs.text << std::endl;
};
};
class task2 : public Module {
public:
task2() { std::cout << "task2" << std::endl; }
void check(int &id) override {
thread s(&task2::get, this);
s.detach();
};
// The function will first do some other work (here, I use sleep to represent
// that) then set the value of Objs.text
void get() {
std::this_thread::sleep_for(std::chrono::seconds(5));
Objs.text = "AAAA";
std::cout << Objs.text << std::endl;
};
};
int main() {
std::vector<std::unique_ptr<Module>> modules;
modules.push_back(std::make_unique<task1>());
modules.push_back(std::make_unique<task2>());
for (auto &m : modules) {
m->check(Objs.Ids);
}
std::this_thread::sleep_for(std::chrono::seconds(12));
return 0;
}
It is a plain producer-consumer problem.
You have multiple “get()” producers. And did not implemented consumers yet.
First, you should have multiple “Info” for multithread. If there is only one Info, multithread programming is useless. I recommend “concurrent_queue”.
Second, “detach()” is not a good idea. You can’t manage child threads. You’d better use “join()”
My code sample follows. I used Visual Studio 2022
#include <chrono>
#include <iostream>
#include <thread>
#include <vector>
#include <concurrent_queue.h>
using namespace std;
class Info {
public:
Info() { Ids = 10; };
int Ids;
std::string text;
};
concurrency::concurrent_queue<Info> Objs;
void producer()
{
while (true) {
Info obj;
std::this_thread::sleep_for(std::chrono::seconds(5));
obj.text = "AAAA\n";
Objs.push(obj);
}
}
void consumer()
{
while (true) {
std::this_thread::sleep_for(std::chrono::seconds(1));
Info obj;
bool got_it = Objs.try_pop(obj);
if (got_it) {
std::cout << obj.text;
}
}
}
int main() {
const int NUM_CORES = 6;
std::vector<std::thread> threads;
for (int i = 0; i < NUM_CORES / 2; ++i)
threads.emplace_back(producer);
for (int i = 0; i < NUM_CORES / 2; ++i)
threads.emplace_back(consumer);
for (auto& th : threads) th.join();
}
When trying to learn threads most examples suggests that I should put std::mutex, std::condition_variable and std::queue global when sharing data between two different threads and it works perfectly fine for simple scenario. However, in real case scenario and bigger applications this may soon get complicated as I may soon lose track of the global variables and since I am using C++ this does not seem to be an appropriate option (may be I am wrong)
My question is if I have a producer/consumer problem and I want to put both in separate classes, since they will be sharing data I would need to pass them the same mutex and queue now how do I share these two variables between them without defining it to be global and what is the best practice for creating threads?
Here is a working example of my basic code using global variables.
#include <iostream>
#include <thread>
#include <mutex>
#include <queue>
#include <condition_variable>
std::queue<int> buffer;
std::mutex mtx;
std::condition_variable cond;
const int MAX_BUFFER_SIZE = 50;
class Producer
{
public:
void run(int val)
{
while(true) {
std::unique_lock locker(mtx) ;
cond.wait(locker, []() {
return buffer.size() < MAX_BUFFER_SIZE;
});
buffer.push(val);
std::cout << "Produced " << val << std::endl;
val --;
locker.unlock();
// std::this_thread::sleep_for(std::chrono::seconds(2));
cond.notify_one();
}
}
};
class Consumer
{
public:
void run()
{
while(true) {
std::unique_lock locker(mtx);
cond.wait(locker, []() {
return buffer.size() > 0;
});
int val = buffer.front();
buffer.pop();
std::cout << "Consumed " << val << std::endl;
locker.unlock();
std::this_thread::sleep_for(std::chrono::seconds(1));
cond.notify_one();
}
}
};
int main()
{
std::thread t1(&Producer::run, Producer(), MAX_BUFFER_SIZE);
std::thread t2(&Consumer::run, Consumer());
t1.join();
t2.join();
return 0;
}
Typically, you want to have synchronisation objects packaged alongside the resource(s) they are protecting.
A simple way to do that in your case would be a class that contains the buffer, the mutex, and the condition variable. All you really need is to share a reference to one of those to both the Consumer and the Producer.
Here's one way to go about it while keeping most of your code as-is:
class Channel {
std::queue<int> buffer;
std::mutex mtx;
std::condition_variable cond;
// Since we know `Consumer` and `Producer` are the only entities
// that will ever access buffer, mtx and cond, it's better to
// not provide *any* public (direct or indirect) interface to
// them, and use `friend` to grant access.
friend class Producer;
friend class Consumer;
public:
// ...
};
class Producer {
Channel* chan_;
public:
explicit Producer(Channel* chan) : chan_(chan) {}
// ...
};
class Consumer {
Channel* chan_;
public:
explicit Consumer(Channel* chan) : chan_(chan) {}
// ...
};
int main() {
Channel channel;
std::thread t1(&Producer::run, Producer(&channel), MAX_BUFFER_SIZE);
std::thread t2(&Consumer::run, Consumer(&channel));
t1.join();
t2.join();
}
However, (Thanks for the prompt, #Ext3h) a better way to go about this would be to encapsulate access to the synchronisation objects as well, i.e. keep them hidden in the class. At that point Channel becomes what is commonly known as a Synchronised Queue
Here's what I'd subjectively consider a nicer-looking implementation of your example code, with a few misc improvements thrown in as well:
#include <cassert>
#include <iostream>
#include <thread>
#include <mutex>
#include <queue>
#include <optional>
#include <condition_variable>
template<typename T>
class Channel {
static constexpr std::size_t default_max_length = 10;
public:
using value_type = T;
explicit Channel(std::size_t max_length = default_max_length)
: max_length_(max_length) {}
std::optional<value_type> next() {
std::unique_lock locker(mtx_);
cond_.wait(locker, [this]() {
return !buffer_.empty() || closed_;
});
if (buffer_.empty()) {
assert(closed_);
return std::nullopt;
}
value_type val = buffer_.front();
buffer_.pop();
cond_.notify_one();
return val;
}
void put(value_type val) {
std::unique_lock locker(mtx_);
cond_.wait(locker, [this]() {
return buffer_.size() < max_length_;
});
buffer_.push(std::move(val));
cond_.notify_one();
}
void close() {
std::scoped_lock locker(mtx_);
closed_ = true;
cond_.notify_all();
}
private:
std::size_t max_length_;
std::queue<value_type> buffer_;
bool closed_ = false;
std::mutex mtx_;
std::condition_variable cond_;
};
void producer_main(Channel<int>& chan, int val) {
// Don't use while(true), it's Undefined Behavior
while (val >= 0) {
chan.put(val);
std::cout << "Produced " << val << std::endl;
val--;
}
}
void consumer_main(Channel<int>& chan) {
bool running = true;
while (running) {
auto val = chan.next();
if (!val) {
running = false;
continue;
}
std::cout << "Consumed " << *val << std::endl;
};
}
int main()
{
// You are responsible for ensuring the channel outlives both threads.
Channel<int> channel;
std::thread producer_thread(producer_main, std::ref(channel), 13);
std::thread consumer_thread(consumer_main, std::ref(channel));
producer_thread.join();
channel.close();
consumer_thread.join();
return 0;
}
Context: On one of my C++11 application, object serialization and publish of message is time consuming. Therefore I want to do it in a separate thread using Intel TBB library (more specifically using a tbb::task_group)
Issue: the object to serialize is a struct where some of the properties are std::vector<std::unique_ptr<SomeObject>>, making it impossible to pass by copy to the lambda executed in a task
Approximately it look like
struct MockedDC {
MockedDC(int x, std::vector<std::unique_ptr<SomeObject>> v) : x(x),
v(std::move(v)) {};
int x;
std::vector<std::unique_ptr<SomeObject>> v;
};
The "solution" I found, is to reconstruct on the heap with the move-constructor my instance and wrap it in a shared_ptr<MockedDC> which is copyable. In the end the function which invoke the tbb::task_group::run look like
// function called like this `executeInThread(taskGroup, std::move(mockedDC))`
void executeInThread(tbb::task_group& taskGroup, MockedDC mockedDC) {
const std::shared_ptr<MockedDC> sharedMockedDC(new MockedDC(std::move(mockedDC)));
auto f = [sharedMockedDC] {
const auto serialized(serializer(*sharedMockedDC)); // pass by reference
publish(serialized);
};
taskGroup.run(f);
};
it compile and run fine, but I can't put it under pressure as it will be in real life condition
so my question is is it safe/sane to do this ?
I found on another stackoverflow question an alternative, but the implementation looks difficult to maintain given my C++ knowledge :) that's why I want to stick with the shared_ptr approach as suggested somewhere else
What I tried so far: I wrote a dummy code to test the thing, but I think its not enough to validate this approach. I also wanted to compile with some sanitization flags, but tbb fail to link with a bunch of errors like undefined reference to __ubsan_handle_pointer_overflow
Here is the dummy example if that help to answer (it compile and run without issues (except some int overflow but that not an issue I guess))
#include <cstdio>
#include <iostream>
#include <memory>
#include <vector>
#include <numeric>
#include "tbb/task_scheduler_init.h"
#include "tbb/task_group.h"
struct MockedDC {
MockedDC(int seed, size_t baseLen) : seed(seed), baseLen(baseLen) {
this->a_MDC.reserve(baseLen);
for (size_t i = 0; i < baseLen; ++i)
this->a_MDC.emplace_back(new int((seed + i) / (seed + 1)));
};
int seed;
size_t baseLen;
std::vector<std::unique_ptr<int>> a_MDC;
};
void executeInThread(tbb::task_group& taskGroup, MockedDC mockedDC) {
const std::shared_ptr<MockedDC> sharedMockedDC(new MockedDC(std::move(mockedDC)));
auto f = [sharedMockedDC] {
std::cout <<
std::accumulate(sharedMockedDC->a_MDC.begin(), sharedMockedDC->a_MDC.end(), 0, [](int acc, const std::unique_ptr<int>& rhs) {
return acc + *rhs;
})
<< std::endl << std::flush;
};
taskGroup.run(f);
};
void triggerTest(tbb::task_group& taskGroup) {
for (size_t i = 0; i < 1000000; ++i) {
MockedDC mdc(i, 10000000);
executeInThread(taskGroup, std::move(mdc));
}
return ;
};
int main() {
tbb::task_scheduler_init tbbInit(tbb::task_scheduler_init::automatic);
//tbb::task_scheduler_init tbbInit(8);
tbb::task_group taskGroup;
triggerTest(taskGroup);
taskGroup.wait();
return (0);
};
PS: using C++14 new capture by move doesn't work because of TBB library :/
[It is not necessary to follow the links to understand the question].
I combined the implementation of the singleton pattern in this answer, together with the synchronized file writing of this other answer.
Then I wanted to see if the interface of SynchronizedFile could provide a variadic templated write method, but I couldn't figure out how to properly combine this with the std::lock_guard.
Below is a non-working example. In this case it doesn't work because (I think) the two threads manage to pump stuff into the buffer i_buf in a non-synchronized way, resulting in a garbled LOGFILE.txt.
If I put the std::lock_guard inside the general template of write then the program doesn't halt.
#include <iostream>
#include <mutex>
#include <sstream>
#include <fstream>
#include <string>
#include <memory>
#include <thread>
static const int N_LOOP_LENGTH{10};
// This class manages a log file and provides write method(s)
// that allow passing a variable number of parameters of different
// types to be written to the file in a line and separated by commas.
class SynchronizedFile {
public:
static SynchronizedFile& getInstance()
{
static SynchronizedFile instance;
return instance;
}
private:
std::ostringstream i_buf;
std::ofstream i_fout;
std::mutex _writerMutex;
SynchronizedFile () {
i_fout.open("LOGFILE.txt", std::ofstream::out);
}
public:
SynchronizedFile(SynchronizedFile const&) = delete;
void operator=(SynchronizedFile const&) = delete;
template<typename First, typename... Rest>
void write(First param1, Rest...param)
{
i_buf << param1 << ", ";
write(param...);
}
void write()
{
std::lock_guard<std::mutex> lock(_writerMutex);
i_fout << i_buf.str() << std::endl;
i_buf.str("");
i_buf.clear();
}
};
// This is just some class that is using the SynchronizedFile class
// to write stuff to the log file.
class Writer {
public:
Writer (SynchronizedFile& sf, const std::string& prefix)
: syncedFile(sf), prefix(prefix) {}
void someFunctionThatWritesToFile () {
syncedFile.write(prefix, "AAAAA", 4343, "BBBBB", 0.2345435, "GGGGGG");
}
private:
SynchronizedFile& syncedFile;
std::string prefix;
};
void thread_method()
{
SynchronizedFile &my_file1 = SynchronizedFile::getInstance();
Writer writer1(my_file1, "Writer 1:");
for (int i = 0; i < N_LOOP_LENGTH; ++ i)
writer1.someFunctionThatWritesToFile();
}
int main()
{
std::thread t(thread_method);
SynchronizedFile &my_file2 = SynchronizedFile::getInstance();
Writer writer2(my_file2, "Writer 2:");
for (int i = 0; i < N_LOOP_LENGTH; ++i)
writer2.someFunctionThatWritesToFile();
t.join();
std::cout << "Done" << std::endl;
return 0;
}
How could I successfully combine these three ideas?
The program deadlocks because write calls itself recursively while still holding the lock.
Either use a std::recursive_mutex or release the lock after writing your data out but before calling write.
E: Unlocking doesn't do the job, I didn't think this through...
E: Or lock once and defer to another private method to do the write.
template<typename... Args>
void write(Args&&... args)
{
std::unique_lock<std::mutex> lock(_writerMutex);
_write(std::forward<Args>(args)...);
}
template<typename First, typename... Rest>
void _write(First&& param1, Rest&&... param) // private method
{
i_buf << std::forward<First>(param1) << ", ";
_write(std::forward<Rest>(param)...);
}
void _write()
{
i_fout << i_buf.str() << std::endl;
i_buf.clear();
}
I've tried to implement a very basic Thread Local Singleton class in C++ - it's a template class that other classes then inherit from. The problem is that it almost always works, but every now and again (say, 1 run in 15), it will fail with an error along the lines of:
* glibc detected * ./myExe: free(): invalid next size (fast): 0x00002b61a40008c0 ***
please forgive the rather contrived example below, but it serves to demonstrate the problem.
#include <thread>
#include <atomic>
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
template<class T>
class ThreadLocalSingleton
{
public:
/// Return a reference to an instance of the object
static T& instance();
typedef unique_ptr<T> UPtr;
protected:
ThreadLocalSingleton() {}
ThreadLocalSingleton(ThreadLocalSingleton const&);
void operator=(ThreadLocalSingleton const&);
};
template<class T>
T& ThreadLocalSingleton<T>::instance()
{
thread_local T m_instance;
return m_instance;
}
// Create two atomic variables to keep track of the number of times the
// TLS class is created and accessed.
atomic<size_t> creationCount(0);
atomic<size_t> accessCount(0);
// Very simple class which derives from TLS
class MyClass : public ThreadLocalSingleton<MyClass>
{
friend class ThreadLocalSingleton<MyClass>;
public:
MyClass()
{
++creationCount;
}
string getType() const
{
++accessCount;
return "MyClass";
}
};
int main(int,char**)
{
vector<thread> threads;
vector<string> results;
threads.emplace_back([&]() { results.emplace_back(MyClass::instance().getType()); MyClass::instance().getType(); });
threads.emplace_back([&]() { results.emplace_back(MyClass::instance().getType()); MyClass::instance().getType(); });
threads.emplace_back([&]() { results.emplace_back(MyClass::instance().getType()); MyClass::instance().getType(); });
threads.emplace_back([&]() { results.emplace_back(MyClass::instance().getType()); MyClass::instance().getType(); });
for (auto& t : threads)
{
t.join();
}
// Expecting 4 creations and 8 accesses.
cout << "CreationCount: " << creationCount << " AccessCount: " << accessCount << endl;
}
I can replicate this on coliru, using the build command:
g++ -std=c++11 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
Many thanks!
Thanks to both molbdnilo and Damon, who quickly pointed out the obvious - vector::emplace_back isn't thread safe, so there would be no guarantees on whether or not this code would actually work. I've replaced the main() function with the following, which seems to be more reliable.
int main(int,char**)
{
vector<thread> threads;
vector<string> results;
auto addToResult = [&results](const string& val)
{
static mutex m_mutex;
unique_lock<mutex> lock(m_mutex);
results.emplace_back(val);
};
threads.emplace_back([&addToResult]() { addToResult(MyClass::instance().getType()); MyClass::instance().getType(); });
threads.emplace_back([&addToResult]() { addToResult(MyClass::instance().getType()); MyClass::instance().getType(); });
threads.emplace_back([&addToResult]() { addToResult(MyClass::instance().getType()); MyClass::instance().getType(); });
threads.emplace_back([&addToResult]() { addToResult(MyClass::instance().getType()); MyClass::instance().getType(); });
for (auto& t : threads)
{
t.join();
}
// Expecting 4 creations and 8 accesses.
cout << "CreationCount: " << creationCount << " AccessCount: " << accessCount << endl;
}
Thanks!