C++ Queuing function calls - c++

I have a function that queues function callbacks to be executed in another thread.
void Queue(std::function<void()> callback)
{
std::lock_guard<std::mutex> lock(queueMutex);
queue.push_back(callback);
}
The queued functions are called using this function in the main thread:
void ProcessQueue()
{
std::lock_guard<std::mutex> lock(queueMutex);
if (!queue.empty())
{
for (auto& cb : queue)
{
cb();
}
queue.clear();
}
}
I am queuing these callbacks because they must be executed in the main thread. My question is whether it's safe (and appropriate) to chain multiple functions within a single Queue call, like this:
Queue([=]()
{
FunctionA();
FunctionB();
});
Or is it better to separate them like this?
Queue([=]()
{
FunctionA();
});
Queue([=]()
{
FunctionB();
});

I would choose the first option. In the second version, the additional cost of creating and calling the lambda. Since you use capture by value [=], it can be quite costly when capturing heavy objects.
std::function also affects performance.
The second version uses two std::function copies the lambdas with captured parameters for the call, more memory is used here, plus std::function calls the function using a virtual call and this affects performance. Considering these facts, in my opinion, the first version is better.

Related

Execute lambda with CreateThread

Is there a better way to use CreateThread than creating a free function each time for the sole purpose of casting lpParameter?
Are there any modern alternatives to CreateThread for creating persistent threads?
Edit: Perhaps you should just use std::async(lambda). I imagine that it's just implemented with CreateThread. Maybe the answer to this question is looking up how std::async is implemented (assuming it's a library feature).
DWORD WINAPI MyThreadFunction(
_In_ LPVOID lpParameter
)
{
((MyClass*)lpParameter)->RunLoop();
}
void MyClass::LaunchThread()
{
CreateThread(
NULL, // default security attributes
0, // use default stack size
MyThreadFunction, // thread function name
this, // argument to thread function
0, // use default creation flags
NULL); // returns the thread identifier
}
There are several mechanisms for achieving parallelism (std::async etc. as mentioned above).
But the modern one which is most similar to your original code with CreateThread is std::thread. It can be constructed with a global function, a lambda, or a class method (which seems the best fit for you):
m_thread = std::thread([this](){ RunLoop(); }); // pass a lambda
or
m_thread = std::thread(&MyClass::RunLoop, this); // pass a method
Note that a std::thread starts to run (potentially) when constructed. Also note that, std::async does not guarantee that it will run on a separate thread and even if it does run on a thread, it could be a thread from a pool. The behaviour might not be the same as with your original CreateThread.
Here's a complete example of using std::thread (including cancellation):
#include <thread>
#include <chrono>
#include <atomic>
#include <iostream>
class MyClass
{
public:
MyClass() {}
~MyClass() { EndThread(); }
void LaunchThread()
{
EndThread(); // in case it was already running
m_bThreadShouldExit = false;
// Start the thread with a class method:
m_thread = std::thread(&MyClass::RunLoop, this);
}
void EndThread()
{
// Singal the thread to exit, and wait for it:
m_bThreadShouldExit = true;
if (m_thread.joinable())
{
m_thread.join();
}
}
void RunLoop()
{
std::cout << "RunLoop started" << std::endl;
while (!m_bThreadShouldExit)
{
std::cout << "RunLoop doing something ..." << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
std::cout << "RunLoop ended" << std::endl;
}
private:
std::thread m_thread;
std::atomic_bool m_bThreadShouldExit{ false };
};
int main()
{
MyClass m;
m.LaunchThread();
std::this_thread::sleep_for(std::chrono::milliseconds(5000));
m.EndThread();
}
Possible output:
RunLoop started
RunLoop doing something ...
RunLoop doing something ...
RunLoop doing something ...
RunLoop doing something ...
RunLoop doing something ...
RunLoop ended
std::async() and std::thread(, <args...>) are most likely internally implemented as you just did, the only exception is that lambdas without captures can be implicitly converted to function pointers, which pretty much can be passed straight away to CreateThread function with nullptr lpParameter.
Lambdas with capture list are pretty much syntactic sugar but internally they translate to sth like this (very simplified):
struct <internal_lambda_name>
{
<capture list...> fields...;
void operator()(<arguments...>){<code...>;}
};
So they pretty much translate to objects of struct type thus they need some way to store all those captures and in order to be executed on other thread with CreateThread function they need some way of ensuring that the capture list data stored in them will be available during their execution.
I looked in to MSVC implementation of std::async and they implemented it using ::Concurrency::create_task which straight forwardly accepts a callable object.
https://learn.microsoft.com/en-us/cpp/parallel/concrt/task-parallelism-concurrency-runtime
I also looked into their implementation of create_task
template<typename _Ty>
__declspec(noinline) // Ask for no inlining so that the _CAPTURE_CALLSTACK gives us the expected result
explicit task(_Ty _Param)
{
task_options _TaskOptions;
details::_ValidateTaskConstructorArgs<_ReturnType,_Ty>(_Param);
_CreateImpl(_TaskOptions.get_cancellation_token()._GetImplValue(), _TaskOptions.get_scheduler());
// Do not move the next line out of this function. It is important that _CAPTURE_CALLSTACK() evaluates to the call site of the task constructor.
_SetTaskCreationCallstack(_CAPTURE_CALLSTACK());
_TaskInitMaybeFunctor(_Param, decltype(details::_IsCallable(_Param,0))());
}
and so it turns out that launching a lambda on a new thread is quite difficult and beyond the scope of this question.

c++ thread pool: alternative to std::function for passing functions/lambdas to threads?

I have a thread pool that I use to execute many tiny jobs (millions of jobs, dozens/hundreds of milliseconds each). The jobs are passed in the form of either:
std::bind(&fn, arg1, arg2, arg3...)
or
[&](){fn(arg1, arg2, arg3...);}
with the thread pool taking them like this:
std::queue<std::function<void(void)>> queue;
void addJob(std::function<void(void)> fn)
{
queue.emplace_back(std::move(fn));
}
Pretty standard stuff....except that I've noticed a bottleneck where if jobs execute in a fast enough time (less than a millisecond), the conversion from lambda/binder to std::function in the addJob function actually takes longer than execution of the jobs themselves. After doing some reading, std::function is notoriously slow and so my bottleneck isn't necessarily unexpected.
Is there a faster way of doing this type of thing? I've looked into drop-in std::function replacements but they either weren't compatible with my compiler or weren't faster. I've also looked into "fast delegates" by Don Clugston but they don't seem to allow the passing of arguments along with functions (maybe I don't understand them correctly?).
I'm compiling with VS2015u3, and the functions passed to the jobs are all static, with their arguments being either ints/floats or pointers to other objects.
Have a separate queue for each of the task types - you probably don't have tens of thousands of task types. Each of these can be e.g. a static member of your tasks. Then addJob() is actually the ctor of Task and it's perfectly type-safe.
Then define a compile-time list of your task types and visit it via template metaprogramming (for_each). It'll be way faster as you don't need any virtual call fnptr / std::function<> to achieve this.
This will only work if your tuple code sees all the Task classes (so you can't e.g. add a new descendant of Task to an already running executable by loading the image from disc - hope that's a non-issue).
template<typename D> // CRTP on D
class Task {
public:
// you might want to static_assert at some point that D is in TaskTypeList
Task() : it_(tasks_.end()) {} // call enqueue() in descendant
~Task() {
// add your favorite lock here
if (queued()) {
tasks_.erase(it_);
}
}
bool queued() const { return it_ != tasks_.end(); }
static size_t ExecNext() {
if (!tasks_.empty()) {
// add your favorite lock here
auto&& itTask = tasks_.begin();
tasks_.pop_front();
// release lock
(*itTask)();
itTask->it_ = tasks_.end();
}
return tasks_.size();
}
protected:
void enqueue() const
{
// add your favorite lock here
tasks_.push_back(static_cast<D*>(this));
it_ = tasks_.rbegin();
}
private:
std::list<D*>::iterator it_;
static std::list<D*> tasks_; // you can have one per thread, too - then you don't need locking, but tasks are assigned to threads statically
};
struct MyTask : Task<MyTask> {
MyTask() { enqueue(); } // call enqueue only when the class is ready
void operator()() { /* add task here */ }
// ...
};
struct MyTask2; // etc.
template<typename...>
struct list_ {};
using TaskTypeList = list_<MyTask, MyTask2>;
void thread_pocess(list_<>) {}
template<typename TaskType, typename... TaskTypes>
void thread_pocess(list_<TaskType, TaskTypes...>)
{
TaskType::ExecNext();
thread_process(list_<TaskTypes...>());
}
void thread_process(void*)
{
for (;;) {
thread_process(TaskTypeList());
}
}
There's a lot to tune on this code: different threads should start from different parts of the queue (or one would use a ring, or several queues and either static/dynamic assignment to threads), you'd send it to sleep when there are absolutely no tasks, one could have an enum for the tasks, etc.
Note that this can't be used with arbitrary lambdas: you need to list task types. You need to 'communicate' the lambda type out of the function where you declare it (e.g. by returning `std::make_pair(retval, list_) and sometimes it's not easy. However, you can always convert a lambda to a functor, which is straightforward - just ugly.

Implementing a simple, generic thread pool in C++11

I want to create a thread pool for experimental purposes (and for the fun factor). It should be able to process a wide variety of tasks (so I can possibly use it in later projects).
In my thread pool class I'm going to need some sort of task queue. Since the Standard Library provides std::packaged_task since the C++11 standard, my queue will look like std::deque<std::packaged_task<?()> > task_queue, so the client can push std::packaged_tasks into the queue via some sort of public interface function (and then one of the threads in the pool will be notified with a condition variable to execute it, etc.).
My question is related to the template argument of the std::packaged_task<?()>s in the deque.
The function signature ?() should be able to deal with any type/number of parameters, because the client can do something like:
std::packaged_task<int()> t(std::bind(factorial, 342));
thread_pool.add_task(t);
So I don't have to deal with the type/number of parameters.
But what should the return value be? (hence the question mark)
If I make my whole thread pool class a template class, one instance
of it will only be able to deal with tasks with a specific signature
(like std::packaged_task<int()>).
I want one thread pool object to be able to deal with any kind of task.
If I go with std::packaged_task<void()> and the function invoked
returns an integer, or anything at all, then thats undefined behaviour.
So the hard part is that packaged_task<R()> is move-only, otherwise you could just toss it into a std::function<void()>, and run those in your threads.
There are a few ways around this.
First, ridiculously, use a packaged_task<void()> to store a packaged_task<R()>. I'd advise against this, but it does work. ;) (what is the signature of operator() on packaged_task<R()>? What is the required signature for the objects you pass to packaged_task<void()>?)
Second, wrap your packaged_task<R()> in a shared_ptr, capture that in a lambda with signature void(), store that in a std::function<void()>, and done. This has overhead costs, but probably less than the first solution.
Finally, write your own move-only function wrapper. For the signature void() it is short:
struct task {
template<class F,
class dF=std::decay_t<F>,
class=decltype( std::declval<dF&>()() )
>
task( F&& f ):
ptr(
new dF(std::forward<F>(f)),
[](void* ptr){ delete static_cast<dF*>(ptr); }
),
invoke([](void*ptr){
(*static_cast<dF*>(ptr))();
})
{}
void operator()()const{
invoke( ptr.get() );
}
task(task&&)=default;
task&operator=(task&&)=default;
task()=default;
~task()=default;
explicit operator bool()const{return static_cast<bool>(ptr);}
private:
std::unique_ptr<void, void(*)(void*)> ptr;
void(*invoke)(void*) = nullptr;
};
and simple. The above can store packaged_task<R()> for any type R, and invoke them later.
This has relatively minimal overhead -- it should be cheaper than std::function, at least the implementations I've seen -- except it does not do SBO (small buffer optimization) where it stores small function objects internally instead of on the heap.
You can improve the unique_ptr<> ptr container with a small buffer optimization if you want.
I happen to have an implementation which does exactly that. My way of doing things is to wrap the std::packaged_task objects in a struct which abstracts away the return type. The method which submits a task into the thread pool returns a future on the result.
This kind of works, but due to the memory allocations required for each task it is not suitable for tasks which are very short and very frequent (I tried to use it to parallelize chunks of a fluid simulation and the overhead was way too high, in the order of several milliseconds for 324 tasks).
The key part is this structure:
struct abstract_packaged_task
{
template <typename R>
abstract_packaged_task(std::packaged_task<R> &&task):
m_task((void*)(new std::packaged_task<R>(std::move(task)))),
m_call_exec([](abstract_packaged_task *instance)mutable{
(*(std::packaged_task<R>*)instance->m_task)();
}),
m_call_delete([](abstract_packaged_task *instance)mutable{
delete (std::packaged_task<R>*)(instance->m_task);
})
{
}
abstract_packaged_task(abstract_packaged_task &&other);
~abstract_packaged_task();
void operator()();
void *m_task;
std::function<void(abstract_packaged_task*)> m_call_exec;
std::function<void(abstract_packaged_task*)> m_call_delete;
};
As you can see, it hides away the type dependencies by using lambdas with std::function and a void*. If you know the maximum size of all possibly occuring std::packaged_task objects (I have not checked whether the size has a dependency on R at all), you could try to further optimize this by removing the memory allocation.
The submission method into the thread pool then does this:
template <typename R>
std::future<R> submit_task(std::packaged_task<R()> &&task)
{
assert(m_workers.size() > 0);
std::future<R> result = task.get_future();
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_task_queue.emplace_back(std::move(task));
}
m_queue_wakeup.notify_one();
return result;
}
where m_task_queue is an std::deque of abstract_packaged_task structs. m_queue_wakeup is a std::condition_variable to wake a worker thread up to pick up the task. The worker threads implementation is as simple as:
void ThreadPool::worker_impl()
{
std::unique_lock<std::mutex> lock(m_queue_mutex, std::defer_lock);
while (!m_terminated) {
lock.lock();
while (m_task_queue.empty()) {
m_queue_wakeup.wait(lock);
if (m_terminated) {
return;
}
}
abstract_packaged_task task(std::move(m_task_queue.front()));
m_task_queue.pop_front();
lock.unlock();
task();
}
}
You can take a look at the full source code and the corresponding header on my github.

C++ return value on concurrent queue pushing functions

After receiving answers to a previous question on logging on a different thread, I am currently at the following bit of code (note: the concurrent_queue here is from ppl, but any other concurrent_queue should work):
class concurrentFuncQueue
{
private:
typedef std::function<void()> LambdaFunction;
mutable concurrency::concurrent_queue<LambdaFunction> functionQueue;
mutable std::atomic<bool> endcond;
LambdaFunction function;
std::thread thd;
public:
concurrentFuncQueue() : endcond(false), thd([=]{
while (endcond != true)
{
if (functionQueue.try_pop( function ))
{
function(); //note: I am popping a function and adding () to execute it
}
}
}){}
~concurrentFuncQueue() { functionQueue.push([=]{ endcond = true; }); thd.join(); }
void pushFunction(LambdaFunction function) const { functionQueue.push(function); }
};
Basically the functions I push are run on a different thread sequentially (ex. a logging function) as to avoid performance issues on the main thread.
Current usage is along the following:
static concurrentFuncQueue Logger;
vector<char> outstring(256);
Logger.pushFunction([=]{ OutputDebugString(debugString.c_str()) });
Great so far. I can push functions on to a concurrent queue that will run my functions sequentially on a separate thread.
One thing I also need to have, but currently don't are return values so that ex (pseudo-code):
int x = y = 3;
auto intReturn = Logger.pushFunction([=]()->int { return x * y; });
will push x * y on to the concurrent queue, and after the pop and completion of the function (on the other thread), returns the value calculated to the caller thread.
(I understand that I'll be blocking the caller thread until the pushed function is returned. That is exactly what I want)
I get the feeling that I might have to use something along the line of std::promise, but sadly my current low understanding of them prevent me from formulating something codable.
Any ideas? Thoughts on the above C++ code and any other comments are also much welcome (please just ignore the code completely if you feel another implementation is more appropriate or solves the problem).
You should be able to use something along the lines of:
template<typename Foo>
std::future<typename std::result_of<Foo()>::type> pushFunction(Foo&& f) {
using result_type = typename std::result_of<Foo()>::type; // change to typedef if using is not supported
std::packaged_task<result_type()> t(f);
auto ret_fut = t.get_future();
functionQueue.push(std::move(t));
return ret_fut;
}
For this to work you need to make your LambdaFunction a type-erased function handler.

Lambdas and threads

I've recently started using lambdas an awful lot within threads, and want to make sure I'm not setting myself up for thread-safety issues/crashes later. My usual way of using them is:
class SomeClass {
int someid;
void NextCommand();
std::function<void(int, int)> StoreNumbers;
SomeClass(id, fn); // constructor sets id and storenumbers fn
}
// Called from multiple threads
static void read_callback(int fd, void* ptr)
{
SomeClass* sc = static_cast<SomeClass*>ptr;
..
sc->StoreNumbers(someint,someotherint); // voila, thread specific storage.
}
static DWORD WINAPI ThreadFn(LPVOID param)
{
std::list<int> ints1;
std::list<int> ints2;
auto storenumbers = [&] (int i, int i2) {
// thread specific lambda.
ints1.push_back(i);
ints2.push_back(i2);
};
SomeClass s(id, storenumbers);
...
// set up something that eventually calls read_callback with s set as the ptr.
}
ThreadFn is used as the thread function for 30-40 threads.
Is this acceptable? I usually have a few of these thread-specific lambdas that operate on a bunch of thread specific data.
Thank you!
There's no problem here. A data access with a lambda is no different to a data access with a named function, through inline code, a traditional functor, one made with bind, or any other way. As long as that lambda is invoked from only one thread at a time, I don't see any evidence of thread-related problems.