Deadlock when adding job to thread pool - c++

I'm having a problem with my game freezing when adding a job to the thread pool. I've been going over my code but can't find the problem.
My thread pool is mostly standard and contains a list of jobs to perform. The worker threads fetch jobs from this list and perform them. Then they signal that they finished the job (this is so I can wait for all jobs to be finished (not just started/removed from the job list) without joining the threads (I want to use them next frame too)).
void ThreadPool::Add(std::function<void()> job) {
{
std::unique_lock<std::mutex> lock(mJobMutex);
mJobs.push(job);
++mUnfinishedJobs;
}
mJobCondition.notify_one();
}
void Worker::Execute() {
std::function<void()> job;
while (true) {
{
std::unique_lock<std::mutex> lock(mThreadPool.mJobMutex);
while (!mThreadPool.mStop && mThreadPool.mJobs.empty()) {
// Wait for new job to become available.
mThreadPool.mJobCondition.wait(lock);
}
if (mThreadPool.mStop)
return;
// Get next job.
job = mThreadPool.mJobs.front();
mThreadPool.mJobs.pop();
}
// Perform the job.
job();
// Signal that we finished the job.
{
std::unique_lock<std::mutex> lock(mThreadPool.mJobMutex);
--mThreadPool.mUnfinishedJobs;
}
mThreadPool.mFinishedCondition.notify_all();
}
}
Through some logging I managed to boil it down to mJobCondition.notify_one() in ThreadPool::Add. I placed some logging before and after that statement and it always hanged there. This is very odd to me. Sure, notify_one can miss the threads waiting for it, but if it does, it should just do nothing. It seems very odd to me that it would freeze on that line.
And if the problem is that I'm locking incorrectly and the thread pool and the worker thread are accessing memory at the same time shouldn't it crash and burn rather than freeze?
I'm on Windows using MinGW.
I also have a Wait and Stop method in the thread pool (which is what the mUnfinishedJobs variable is for) but I didn't include them since I know it's freezing when doing Add.
Here's the full threading code if you need more context.
I know I could probably use some threading library that does thread pools for me, but I want to learn how it's done.

Related

Create threads dynamically depending on time needs of single tasks

Say I have a list of callable objects like
std::list<std::shared_ptr<Callable>> tasks;
and the task is to run them all in an infinite loop, say
void run_all(const bool& abort){
while(true){
for(const auto& ptr : tasks){
if (abort) return;
(*ptr)();
}
}
}
This is fine as long as every "task" finishes after short time. Now I'd like to add the requirement that whenever a task needs more time than a specific threshold, a new thread should be created so that the other tasks do not have do wait for a specific long running task.
The simplest solution regarding code complexity I can think of at the moment would be creating a thread for each task:
void run_all(const bool& abort){
auto job = [&](std::shared_ptr<Callable> task){
while (!abort){
(*task)();
}
};
std::list<std::thread> threads;
for(auto& ptr : tasks){
threads.emplace_back(job, ptr);
}
for(auto& t : threads){
t.join();
}
}
But this might create inappropriate many threads.
What is an appropriate way to implement running the tasks and create threads dynamically depending on how long a tasks needs to be finished? Say we have got some
std::chrono::duration threshold;
and the goal is to run the first task and continue with the next afterwards if the first one takes no longer than threshold until finish, but create a new thread to run the rest of the tasks in parallel, if the first task does not finish before threashold. The generalized goal is:
If there is no task that has been finished in some thread so that another task began to run during the certain period of time threshold, then a new thread should be created so that other tasks which may potentially run in very short time do not have to wait.
If there are more than 3 threads that finish at least one task per period threshold, one of them should be joined.
There may be tasks that itself run ad infinitum. This should have no effect on the other tasks.
What could be an appropriate implementation satisfying these requirements or at least doing something related or at least a concept of an implementation?
Or is it completely fine to just create a bunch of threads? (I think about running such an application on a low performance machine like Raspberry Pi and a set of 50 to 300 tasks that should be treated.)

How to integrate Cap'n'Proto threads with non Cap'n'Proto threads?

How do I properly integrate Cap'n'Proto client usage with surrounding multi-threaded code? The Cap'n'Proto docs say that each Cap'n'Proto interface is single-threaded with a dedicated event loop. Additionally they recommend using Cap'n'Proto to communicate between threads. However, the docs don't seem to describe how non-Cap'n'Proto threads (e.g. the UI loop) could integrate with that. Even if could integrate Cap'n'Proto event loops with the UI loop in some places, other models like thread pools (Android Binder, global libdispatch queues) seem more challenging.
I think the solution is to cache the thread executor for the client thread in a synchronized place that the non-capnp thread will access it.
I believe though that the calling thread always needs to be on its own event loop as well to marry them but I just want to make sure that's actually the case. My initial attempt to do that in a simple unit test is failing. I created a KjLooperEventPort class (following the structure for the node libuv adapter) to marry KJ & ALooper on Android.
Then my test code is:
TEST(KjLooper, CrossThreadPromise) {
std::thread::id kjThreadId;
ConditionVariable<const kj::Executor*> executorCv{nullptr};
ConditionVariable<std::pair<bool, kj::Promise<void>>> looperThreadFinished{false, nullptr};
std::thread looperThread([&] {
auto looper = android::newLooper();
android::KjLooperEventPort kjEventPort{looper};
kj::WaitScope waitScope(kjEventPort.getKjLoop());
auto finished = kj::newPromiseAndFulfiller<void>();
looperThreadFinished.constructValueAndNotifyAll(true, kj::mv(finished.promise));
executorCv.waitNotValue(nullptr);
auto executor = executorCv.readCopy();
kj::Promise<void> asyncPromise = executor->executeAsync([&] {
ASSERT_EQ(std::this_thread::get_id(), kjThreadId);
});
asyncPromise = asyncPromise.then([tid = std::this_thread::get_id(), kjThreadId, &finished] {
std::cerr << "Running promise completion on original thread\n";
ASSERT_NE(tid, kjThreadId);
ASSERT_EQ(std::this_thread::get_id(), tid);
std::cerr << "Fulfilling\n";
finished.fulfiller->fulfill();
std::cerr << "Fulfilled\n";
});
asyncPromise.wait(waitScope);
});
std::thread kjThread([&] {
kj::Promise<void> finished = kj::NEVER_DONE;
looperThreadFinished.wait([&](auto& promise) {
finished = kj::mv(promise.second);
return promise.first;
});
auto ioContext = kj::setupAsyncIo();
kjThreadId = std::this_thread::get_id();
executorCv.setValueAndNotifyAll(&kj::getCurrentThreadExecutor());
finished.wait(ioContext.waitScope);
});
looperThread.join();
kjThread.join();
}
This crashes fulfilling the promise back to the kj thread.
terminating with uncaught exception of type kj::ExceptionImpl: kj/async.c++:1269: failed: expected threadLocalEventLoop == &loop || threadLocalEventLoop == nullptr; Event armed from different thread than it was created in. You must use
Executor to queue events cross-thread.
Most Cap'n Proto RPC and KJ Promise-related objects can only be accessed in the thread that created them. Resolving a promise cross-thread, for example, will fail, as you saw.
Some ways you could solve this include:
You can use kj::Executor to schedule code to run on a different thread's event loop. The calling thread does NOT need to be a KJ event loop thread if you use executeSync() -- however, this function blocks until the other thread has had a chance to wake up and execute the function. I'm not sure how well this will perform in practice; if it's a problem, there is probably room to extend the Executor interface to handle this use case more efficiently.
You can communicate between threads by passing messages over pipes or socketpairs (but sending big messages this way would involve a lot of unnecessary copying to/from the socket buffer).
You could signal another thread's event loop to wake up using a pipe, signal, or (on Linux) eventfd, then have it look for messages in a mutex-protected queue. (But kj::Executor mostly obsoletes this technique.)
It's possible, though not easy, to adapt KJ's event loop to run on top of other event loops, so that both can run in the same thread. For example, node-capnp adapts KJ to run on top of libuv.

asio::async_write incredibly difficult to synchronize on a high volume stream

I am currently using the Asio C++ library and wrote a client wrapper around it. My original approach was very basic and only needed to stream in a single direction. Requirements have changed and I've switched over to using all asynchronous calls. Most of the migration has been easy except for the asio::async_write(...). I have used a few different approaches and inevitably run into a deadlock with each one.
The application streams data at a high volume continuously. I have stayed away from strands because they do not block and can lead to memory issues especially when the server is under heavy load. Jobs will back up and the applications heap indefinitely grows.
So I created a blocking queue only to find out the hard way that using locks across callbacks and or blocking events leads to unknown behavior.
The wrapper is a very large class, so I will try to explain my landscape in its current state and hopefully get some good suggestions:
I have an asio::steady_timer that runs on a fixed schedule to push a heartbeat message directly into the blocking queue.
A thread dedicated to reading events and pushing them to the blocking queue
A thread dedicated to consumption of the blocking queue
For example, in my queue I have a queue::block() and queue::unblock() that are just wrappers for the condition variable / mutex.
std::thread consumer([this]() {
std::string message_buffer;
while (queue.pop(message_buffer)) {
queue.stage_block();
asio::async_write(*socket, asio::buffer(message_buffer), std::bind(&networking::handle_write, this, std::placeholders::_1, std::placeholders::_2));
queue.block();
}
});
void networking::handle_write(const std::error_code& error, size_t bytes_transferred) {
queue.unblock();
}
When the socket backs up and the server can no longer accept data because of the current load, the queue fills up and leads to a deadlock where handle_write(...) is never called.
The other approach eliminates the consumer thread entirely and relies on handle_write(...) to pop the queue. Like so:
void networking::write(const std::string& data) {
if (!queue.closed()) {
std::stringstream stream_buffer;
stream_buffer << data << std::endl;
spdlog::get("console")->debug("pushing to queue {}", queue.size());
queue.push(stream_buffer.str());
if (queue.size() == 1) {
spdlog::get("console")->debug("handle_write: {}", stream_buffer.str());
asio::async_write(*socket, asio::buffer(stream_buffer.str()), std::bind(&networking::handle_write, this, std::placeholders::_1, std::placeholders::_2));
}
}
}
void networking::handle_write(const std::error_code& error, size_t bytes_transferred) {
std::string message;
queue.pop(message);
if (!queue.closed() && !queue.empty()) {
std::string front = queue.front();
asio::async_write(*socket, asio::buffer(queue.front()), std::bind(&networking::handle_write, this, std::placeholders::_1, std::placeholders::_2));
}
}
This also resulted in a deadlock and obviously results in other race problems. When I disabled my heartbeat callback, I had absolutely no issues. However, the heartbeat is a requirement.
What am I doing wrong? What is a better approach?
It appears all my pain derived from the heartbeat entirely. Disabling the heartbeat in each variation of my asynchronous write operations seem to cure my problems, so this lead me to believe that this could be a result of using the built in asio::async_wait(...) and the asio::steady_timer.
Asio synchronizes its work internally and waits for jobs to complete before executing the next job. Using the asio::async_wait(...) to construct my heartbeat functionality was my design flaw because it operated on the same thread that waited on pending jobs. It created a deadlock with Asio when the heartbeat waited on queue::push(...). This would explain why asio::async_write(...) completion handler never executed in my first example.
The solution was to put the heartbeat on its own thread and let it work independently from Asio. I am still using my blocking queue to synchronize calls to asio::async_write(...) but have modified my consumer thread to use std::future and std::promise. This synchronizes the callback with my consumer thread cleanly.
std::thread networking::heartbeat_worker() {
return std::thread([&]() {
while (socket_opened) {
spdlog::get("console")->trace("heartbeat pending");
write(heartbeat_message);
spdlog::get("console")->trace("heartbeat sent");
std::unique_lock<std::mutex> lock(mutex);
socket_closed_event.wait_for(lock, std::chrono::milliseconds(heartbeat_interval), [&]() {
return !socket_opened;
});
}
spdlog::get("console")->trace("heartbeat thread exited gracefully");
});
}

Keeping two cross-communicating asio io_service objects busy

I am using boost:asio with multiple io_services to keep different forms of blocking I/O separate. E.g. I have one io_service for blocking file I/O, and another for long-running CPU-bound tasks (and this could be extended to a third for blocking network I/O, etc.) Generally speaking I want to ensure that one form of blocking I/O cannot starve the others.
The problem I am having is that since tasks running in one io_service can post events to other io_service (e.g. a CPU-bound task may need to start a file I/O operation, or a completed file I/O operation may invoke a CPU-bound callback), I don't know how to keep both io_services running until they are both out of events.
Normally with a single I/O service, you do something like:
shared_ptr<asio::io_service> io_service (new asio::io_service);
shared_ptr<asio::io_service::work> work (
new asio::io_service::work(*io_service));
// Create worker thread(s) that call io_service->run()
io_service->post(/* some event */);
work.reset();
// Join worker thread(s)
However if I simply do this for both io_services, the one into which I did not post an initial event finishes immediately. And even if I post initial events to both, if the initial event on io_service B finishes before the task on io_service A posts a new event to B, io_service B will finish prematurely.
How can I keep io_service B running while io_service A is still processing events (because one of the queued events in service A might post a new event to B), and vice-versa, while still ensuring that both io_services exit their run() methods if they are ever both out of events at the same time?
Figured out a way to do this, so documenting it for the record in case anyone else finds this question in a search:
Create each N cross-communicating io_services, create a work object for each of them, and then start their worker threads.
Create a "master" io_service object which will not run any worker threads.
Do not allow posting events directly to the services. Instead, create accessor functions to the io_services which will:
Create a work object on the master thread.
Wrap the callback in a function that runs the real callback, then deletes the work.
Post this wrapped callback instead.
In the main flow of execution, once all of the N io_services have started and you have posted work to at least one of them, call run() on the master io_service.
When the master io_service's run() method returns, delete all of the initial work on the N cross-communicating io_services, and join all worker threads.
Having the master io_service's thread own work on each of the other io_services ensures that they will not terminate until the master io_service runs out of work. Having each of the other io_services own work on the master io_service for every posted callback ensure that the master io_service will not run out of work until every one of the other io_services no longer has any posted callbacks left to process.
An example (could be enapsulated in a class):
shared_ptr<boost::asio::io_service> master_io_service;
void RunWorker(boost::shared_ptr<boost::asio::io_service> io_service) {
io_service->run();
}
void RunCallbackAndDeleteWork(boost::function<void()> callback,
boost::asio::io_service::work* work) {
callback();
delete work;
}
// All new posted callbacks must come through here, rather than being posted
// directly to the io_service object.
void PostToService(boost::shared_ptr<boost::asio::io_service> io_service,
boost::function<void()> callback) {
io_service->post(boost::bind(
&RunCallbackAndDeleteWork, callback,
new boost::asio::io_service::work(*master_io_service)));
}
int main() {
vector<boost::shared_ptr<boost::asio::io_service> > io_services;
vector<boost::shared_ptr<boost::asio::io_service::work> > initial_work;
boost::thread_pool worker_threads;
master_io_service.reset(new boost::asio::io_service);
const int kNumServices = X;
const int kNumWorkersPerService = Y;
for (int i = 0; i < kNumServices; ++i) {
shared_ptr<boost::asio::io_service> io_service(new boost::asio::io_service);
io_services.push_back(io_service);
initial_work.push_back(new boost::asio::io_service::work(*io_service));
for (int j = 0; j < kNumWorkersPerService; ++j) {
worker_threads.create_thread(boost::bind(&RunWorker, io_service));
}
}
// Use PostToService to start initial task(s) on at least one of the services
master_io_service->run();
// At this point, there is no real work left in the services, only the work
// objects in the initial_work vector.
initial_work.clear();
worker_threads.join_all();
return 0;
}
The HTTP server example 2 does something similar that you may find useful. It uses the concept of an io_service pool that retains vectors of shared_ptr<boost::asio::io_service> and a shared_ptr<boost::asio::io_service::work> for each io_service. It uses a thread pool to run each service.
The example uses a round-robin scheduling for doling out work to the I/O services, I don't think that will apply in your case since you have specific tasks for io_service A and io_service B.

Are Quartz scheduler instances thread safe?

Can more than one thread safely call methods on an instance of Scheduler returned by the StdSchedulerFactory concurrently?
I had this problem so thought I'd look at the source code. Assuming you are using a standard configuration of Quartz (storing jobs and triggers in RAM instead of a persistent JobStore), then it appears that Quartz is thread safe.
Digging into the source, you will finally get to the RamJobStore, which stores all jobs and triggers in memory.
public void storeJobAndTrigger(SchedulingContext ctxt, JobDetail newJob,
Trigger newTrigger) throws JobPersistenceException {
storeJob(ctxt, newJob, false);
storeTrigger(ctxt, newTrigger, false);
}
In each of the storeJob(..) and storeTrigger(..) methods, there are separate synchronized blocks with their own unique objects for storing jobs and triggers in a thread safe manner:
synchronized (jobLock) {
if (!repl) {
// get job group
...
}
}
And synchronizing a trigger:
synchronized (triggerLock) {
...
synchronized (pausedTriggerGroups) {
...
}
}
So in short, it would appear that you can make thread safe calls to an instance of the Scheduler class
This post on the Terracotta website confirms it.