Deadlock inside gRPC Read() function - c++

I am working on a C++ project, which uses Google Pub/Sub.
As there is no native support for Google Pub/Sub in C++, I am using it through gRPC. Thus, I have generated corresponding pubsub.grpc.pb.h, pubsub.grpc.pb.cc, pubsub.pb.h and pubsub.pb.cc files via protoc.
I wrote a lightweight wrapper-class, for subscription management. Class basically creates a new thread and starts listening for new messages. Here is the code example (code was built based on this question):
class Consumer
{
public:
Consumer();
~Consumer();
void startConsume();
// ...
std::string m_subscriptionName;
std::unique_ptr<std::thread> m_thread;
std::shared_ptr<grpc::Channel> m_channel;
std::unique_ptr<google::pubsub::v1::Subscriber::Stub> m_stub;
std::atomic<bool> m_runThread;
};
Consumer::Consumer()
{
m_channel = grpc::CreateChannel("pubsub.googleapis.com:443", grpc::GoogleDefaultCredentials());
m_stub = google::pubsub::v1::Subscriber::NewStub(m_channel);
m_subscriptionName = "something";
}
Consumer::~Consumer()
{
m_runThread = false;
if (m_thread && m_thread->joinable())
{
m_thread->join();
}
}
void Consumer::startConsume()
{
m_thread.reset(new std::thread([this]()
{
m_runThread = true;
while (m_runThread)
{
grpc::ClientContext context;
std::unique_ptr<grpc::ClientReaderWriter<google::pubsub::v1::StreamingPullRequest,
google::pubsub::v1::StreamingPullResponse>> stream(m_stub->StreamingPull(&context));
// send the initial message
google::pubsub::v1::StreamingPullRequest req;
req.set_subscription(m_subscriptionName);
req.set_stream_ack_deadline_seconds(10);
// if write passed successfully, start subscription
if (!stream->Write(req))
{
continue;
}
// receive messages
google::pubsub::v1::StreamingPullResponse response;
while (stream->Read(&response))
{
google::pubsub::v1::StreamingPullRequest ack_request;
for (const auto& message : response.received_messages())
{
// process messages ...
ack_request.add_ack_ids(message.ack_id());
}
stream->Write(ack_request);
}
}
}));
}
Several instances of the Consumer class are created within a process.
It seems works fine. However sometimes program stucks on stream->Read(&response)code. Debugging showed that thread was stuck inside of Read() function call - the stream does not read anything and does not exit from function either, despite that Pub/Sub buffer is not empty. After restarting the application, all messages are successfully read. It seems like a deadlock inside of Read().
Is there anything that I am doing wrong? What can cause this behavior?

Related

C++ GRPC ClientAsyncReaderWriter: how to check if data is available for read?

I have bidirectional streaming async grpc client that use ClientAsyncReaderWriter for communication with server. RPC code looks like:
rpc Process (stream Request) returns (stream Response)
For simplicity Request and Response are bytes arrays (byte[]). I send several chunks of data to server, and when server accumulate enough data, server process this data and send back the response and continue accumulating data for next responses. After several responses, the server send final response and close connection.
For async client I using CompletionQueue. Code looks like:
...
CompletionQueue cq;
std::unique_ptr<Stub> stub;
grpc::ClientContext context;
std::unique_ptr<grpc::ClientAsyncReaderWriter<Request,Response>> responder = stub->AsyncProcess(&context, &cq, handler);
// thread for completition queue
std::thread t(
[]{
void *handler = nullptr;
bool ok = false;
while (cq_.Next(&handler, &ok)) {
if (can_read) {
// how do you know that it is read data available
// Do read
} else {
// do write
...
Request request = prepare_request();
responder_->Write(request, handler);
}
}
}
);
...
// wait
What is the proper way to async reading? Can I try to read if it no data available? Is it blocking call?
Sequencing Read() calls
Can I try to read if it no data available?
Yep, and it's going to be case more often than not. Read() will do nothing until data is available, and only then put its passed tag into the completion queue. (see below for details)
Is it blocking call?
Nope. Read() and Write() return immediately. However, you can only have one of each in flight at any given moment. If you try to send a second one before the previous has completed, it (the second one) will fail.
What is the proper way to async reading?
Each time a Read() is done, start a new one. For that, you need to be able to tell when a Read() is done. This is where tags come in!
When you call Read(&msg, tag), or Write(request, tag),you are telling grpc to put tag in the completion queue associated with that responder once that operation has completed. grpc doesn't care what the tag is, it just hands it off.
So the general strategy you will want to go for is:
As soon as you are ready to start receiving messages:
call responder->Read() once with some tag that you will recognize as a "read done".
Whenever cq_.Next() gives you back that tag, and ok == true:
consume the message
Queue up a new responder->Read() with that same tag.
Obviously, you'll also want to do something similar for your calls to Write().
But since you still want to be able to lookup the handler instance from a given tag, you'll need a way to pack a reference to the handler as well as information about which operation is being finished in a single tag.
Completion queues
Lookup the handler instance from a given tag? Why?
The true raison d'ĂȘtre of completion queues is unfortunately not evident from the examples. They allow multiple asynchronous rpcs to share the same thread. Unless your application only ever makes a single rpc call, the handling thread should not be associated with a specific responder. Instead, that thread should be a general-purpose worker that dispatches events to the correct handler based on the content of the tag.
The official examples tend to do that by using pointer to the handler object as the tag. That works when there's a specific sequence of events to expect since you can easily predict what a handler is reacting to. You often can't do that with async bidirectional streams, since any given completion event could be a Read() or a Write() finishing.
Example
Here's a general outline of what I personally consider to be a clean way to go about all that:
// Base class for async bidir RPCs handlers.
// This is so that the handling thread is not associated with a specific rpc method.
class RpcHandler {
// This will be used as the "tag" argument to the various grpc calls.
struct TagData {
enum class Type {
start_done,
read_done,
write_done,
// add more as needed...
};
RpcHandler* handler;
Type evt;
};
struct TagSet {
TagSet(RpcHandler* self)
: start_done{self, TagData::Type::start_done},
read_done{self, TagData::Type::read_done},
write_done{self, TagData::Type::write_done} {}
TagData start_done;
TagData read_done;
TagData write_done;
};
public:
RpcHandler() : tags(this) {}
virtual ~RpcHandler() = default;
// The actual tag objects we'll be passing
TagSet tags;
virtual void on_ready() = 0;
virtual void on_recv() = 0;
virtual void on_write_done() = 0;
static void handling_thread_main(grpc::CompletionQueue* cq) {
void* raw_tag = nullptr;
bool ok = false;
while (cq->Next(&raw_tag, &ok)) {
TagData* tag = reinterpret_cast<TagData*>(raw_tag);
if(!ok) {
// Handle error
}
else {
switch (tag->evt) {
case TagData::Type::start_done:
tag->handler->on_ready();
break;
case TagData::Type::read_done:
tag->handler->on_recv();
break;
case TagData::Type::write_done:
tag->handler->on_write_done();
break;
}
}
}
}
};
void do_something_with_response(Response const&);
class MyHandler final : public RpcHandler {
public:
using responder_ptr =
std::unique_ptr<grpc::ClientAsyncReaderWriter<Request, Response>>;
MyHandler(responder_ptr responder) : responder_(std::move(responder)) {
// This lock is needed because StartCall() can
// cause the handler thread to access the object.
std::lock_guard lock(mutex_);
responder_->StartCall(&tags.start_done);
}
~MyHandler() {
// TODO: finish/abort the streaming rpc as appropriate.
}
void send(const Request& msg) {
std::lock_guard lock(mutex_);
if (!sending_) {
sending_ = true;
responder_->Write(msg, &tags.write_done);
} else {
// TODO: add some form of synchronous wait, or outright failure
// if the queue starts to get too big.
queued_msgs_.push(msg);
}
}
private:
// When the rpc is ready, queue the first read
void on_ready() override {
std::lock_guard l(mutex_); // To synchronize with the constructor
responder_->Read(&incoming_, &tags.read_done);
};
// When a message arrives, use it, and start reading the next one
void on_recv() override {
// incoming_ never leaves the handling thread, so no need to lock
// ------ If handling is cheap and stays in the handling thread.
do_something_with_response(incoming_);
responder_->Read(&incoming_, &tags.read_done);
// ------ If responses is expensive or involves another thread.
// Response msg = std::move(incoming_);
// responder_->Read(&incoming_, &tags.read_done);
// do_something_with_response(msg);
};
// When has been sent, send the next one is there is any
void on_write_done() override {
std::lock_guard lock(mutex_);
if (!queued_msgs_.empty()) {
responder_->Write(queued_msgs_.front(), &tags.write_done);
queued_msgs_.pop();
} else {
sending_ = false;
}
};
responder_ptr responder_;
// Only ever touched by the handler thread post-construction.
Response incoming_;
bool sending_ = false;
std::queue<Request> queued_msgs_;
std::mutex mutex_; // grpc might be thread-safe, MyHandler isn't...
};
int main() {
// Start the thread as soon as you have a completion queue.
auto cq = std::make_unique<grpc::CompletionQueue>();
std::thread t(RpcHandler::handling_thread_main, cq.get());
// Multiple concurent RPCs sharing the same handling thread:
MyHandler handler1(serviceA->MethodA(&context, cq.get()));
MyHandler handler2(serviceA->MethodA(&context, cq.get()));
MyHandlerB handler3(serviceA->MethodB(&context, cq.get()));
MyHandlerC handler4(serviceB->MethodC(&context, cq.get()));
}
If you have a keen eye, you will notice that the code above stores a bunch (1 per event type) of redundant this pointers in the handler. It's generally not a big deal, but it is possible to do without them via multiple inheritance and downcasting, but that's starting to be somewhat beyond the scope of this question.

When is the earliest an async_write completion handler will complete?

Consider a Connection class in a boost::asio TCP server program that looks something like this.
#ifndef CONNECTION_HPP
#define CONNECTION_HPP
#include <iostream>
#include <boost/asio.hpp>
namespace Transmission
{
class Connection
{
public:
using SocketType = boost::asio::ip::tcp::socket;
explicit Connection(boost::asio::io_service& io_service)
: m_socket{io_service},
m_outputBuffer{},
m_writeBuffer{},
m_outputStream{&m_outputBuffer},
m_writeStream{&m_writeBuffer},
m_outputStreamPointer{&m_outputStream},
m_writeStreamPointer{&m_writeStream},
m_outputBufferPointer{&m_outputBuffer},
m_writeBufferPointer{&m_writeBuffer},
m_awaitingWrite{false},
m_pendingWrites{false}
{
}
template<typename T>
void write(const T& output)
{
*m_outputStreamPointer << output;
writeToSocket();
}
template<typename T>
std::ostream& operator<<(const T& output)
{
write(output);
m_pendingWrites = true;
return *m_outputStreamPointer;
}
std::ostream& getOutputStream()
{
writeToSocket();
m_pendingWrites = true;
return *m_outputStreamPointer;
}
void start()
{
write("Connection started");
}
SocketType& socket() { return m_socket; }
private:
void writeToSocket();
SocketType m_socket;
boost::asio::streambuf m_outputBuffer;
boost::asio::streambuf m_writeBuffer;
std::ostream m_outputStream;
std::ostream m_writeStream;
std::ostream* m_outputStreamPointer;
std::ostream* m_writeStreamPointer;
boost::asio::streambuf* m_outputBufferPointer;
boost::asio::streambuf* m_writeBufferPointer;
bool m_awaitingWrite;
bool m_pendingWrites;
};
}
#endif
Where writeToSocket is defined as follows:
#include "Connection.hpp"
using namespace Transmission;
void Connection::writeToSocket()
{
// If a write is currently happening...
if(m_awaitingWrite)
{
// Alert the async_write's completion handler
// that writeToSocket was called while async_write was busy
// and that there is more to be written to the socket.
m_pendingWrites = true;
return;
}
// Otherwise, notify subsequent calls to this function that we're writing
m_awaitingWrite = true;
// Swap the buffers and stream pointers, so that subsequent writeToSockets
// go into the clear/old/unused buffer
std::swap(m_outputBufferPointer, m_writeBufferPointer);
std::swap(m_outputStreamPointer, m_writeStreamPointer);
// Kick off your async write, sending the front buffer.
async_write(m_socket, *m_writeBufferPointer, [this](boost::system::error_code error, std::size_t){
// The write has completed
m_awaitingWrite = false;
// If there was an error, report it.
if(error)
{
std::cout << "Async write returned an error." << std::endl;
}
else if(m_pendingWrites) // If there are more pending writes
{
// Write them
writeToSocket();
m_pendingWrites = false;
}
});
}
Incase it's not immediately obvious, the connection uses a double-buffering system to ensure that no buffer is both being async_writen and mutated at the same time.
The piece of code I have a question regarding is:
std::ostream& getOutputStream()
{
writeToSocket(); // Kicks off an async_write, returns immediately.
m_pendingWrites = true; // Tell async_write complete handler there's more to write
return *m_outputStreamPointer; // Return the output stream for the user to insert to.
}
such that a connection can be used like: myConnection.getOutputStream() << "Hello";
Specifically, this code relies on an assumption that async_writes completion handler will not be executed until after we return *m_outputStreamPointer. But can we safely make that assumption?
If, for instance, the async_write completion handler completes like the following, nothing would be sent to the user:
std::ostream& getOutputStream()
{
writeToSocket(); // Kicks off an async_write, returns immediately.
// Async_write's completion handler executes.
m_pendingWrites = true; // Tell async_write complete handler there's more to write
// Completion handler already executed so m_pendingWrites = true does nothing.
return *m_outputStreamPointer; // Return the output stream for the user to insert to.
}
After looking at the documentation, I found this:
Regardless of whether the asynchronous operation completes immediately or not, the handler will not be invoked from within this function. Invocation of the handler will be performed in a manner equivalent to using boost::asio::io_service::post().
Which likely accounts for the correct behavior, but I'm not sure exactly why. Did a quick search on boost::asio::io_service::post() but that didn't add much clarity.
Thank you,
~
The documentation bit you quote merely says that the handler will not be invoked before return on the current thread, so in a single-threaded world you have your guarantee.
However if you have multiple threads running io tasks (io_context::run and friends, or implicitly using thread_pool), there is still the same race.
You can counter-act this posting all async tasks related to a connection on a strand (Strands: Use Threads Without Explicit Locking, which is an executor that serializes all tasks posted on it (see ordering guarantees in the docs).

In UWP application, future.wait() keep waiting while trying to synchronize the response from async methods

I am working on developing an UWP application which would load the file from Application local data on click of a Button. For this, I need the StorageFolder object for Application LocalFolder using StorageFolder::GetFolderFromPathAsync() method then i will have to use GetFileAsync() method to read the StorageFile object to read.
I have written the templates to wait for the response from async methods like GetFolderFromPathAsync(), GetFileAsync(), etc. before proceeding.
template <typename T>
T syncAsyncTask(concurrency::task<T> mainTask) {
std::shared_ptr<std::promise<T>> done = std::make_shared<std::promise<T>>();
auto future = done->get_future();
asyncTaskExceptionHandler<T>(mainTask, [&done](bool didFail, T result) {
done->set_value(didFail ? nullptr : result);
});
future.wait();
return future.get();
}
template <typename T, typename CallbackLambda>
void asyncTaskExceptionHandler(concurrency::task<T> mainTask, CallbackLambda&& onResult) {
auto t1 = mainTask.then([onResult = std::move(onResult)](concurrency::task<T> t) {
bool didFail = true;
T result;
try {
result = t.get();
didFail = false;
}
catch (concurrency::task_canceled&) {
OutputDebugStringA("Win10 call was canceled.");
}
catch (Platform::Exception^ e) {
OutputDebugStringA("Error during a Win10 call:");
}
catch (std::exception&) {
OutputDebugStringA("There was a C++ exception during a Win10 call.");
}
catch (...) {
OutputDebugStringA("There was a generic exception during a Win10 call.");
}
onResult(didFail, result);
});
}
Issue :
When i call syncAsyncTask() method with any task to get
its response, it keeps waiting at future.wait() as mainTask never
complete and promise never set its value.
See below code :
void testStorage::MainPage::Btn_Click(Platform::Object^ sender, Windows::UI::Xaml::RoutedEventArgs^ e)
{
Windows::Storage::StorageFolder^ localFolder = Windows::Storage::ApplicationData::Current->LocalFolder;
auto task = concurrency::create_task(Windows::Storage::StorageFolder::GetFolderFromPathAsync(localFolder->Path));
auto folder = syncAsyncTask<Windows::Storage::StorageFolder^>(task);
printString(folder->Path);
}
void printString(Platform::String^ text) {
std::wstring fooW(text->Begin());
std::string fooA(fooW.begin(), fooW.end());
const char* charStr = fooA.c_str();
OutputDebugStringA(charStr);
}
Running environment :
VS2017
Tried with C++14 and C++17, facing same issue.
Windows 10 RS5 Build#17763
Has anyone ever faced this issue?
Please help!! Thanks in advance.
I was able to take the above code and create a simple application that reproduced this issue. Long story short, I was able to get future.wait() to return by telling the continuation in asyncTaskExceptionHandler to run on a background thread:
template <typename T, typename CallbackLambda>
void asyncTaskExceptionHandler(concurrency::task<T> mainTask, CallbackLambda&& onResult) {
// debug
printString(mainTask.is_apartment_aware().ToString());
auto t1 = mainTask.then([onResult = std::move(onResult)](concurrency::task<T> t) {
bool didFail = true;
T result;
try {
result = t.get();
didFail = false;
}
catch (concurrency::task_canceled&) {
OutputDebugStringA("Win10 call was canceled.");
}
catch (Platform::Exception^ e) {
OutputDebugStringA("Error during a Win10 call:");
}
catch (std::exception&) {
OutputDebugStringA("There was a C++ exception during a Win10 call.");
}
catch (...) {
OutputDebugStringA("There was a generic exception during a Win10 call.");
}
// It works with this
}, concurrency::task_continuation_context::use_arbitrary());
}
Assuming the code I used was correct, what I believe to be happening is that we created a deadlock. What we are saying with the above code is:
On the UI/STA thread, create/handle an async operation from GetFolderFromPathAsync
Pass this task off to our syncAsyncTask, which in turn passes this off to asyncTaskExceptionHandler.
asyncTaskExceptionHandler adds a continuation to this task which schedules it to run. By default, tasks run on the thread that called them. In this case, it is the UI/STA thread!
Once the thread is scheduled, we return back to syncAsyncTask to finish. After our call to asyncTaskExceptionHandler we have a future.wait() which blocks until the promise value is set.
This prevents our UI thread from finished execution of the syncAsyncTask, but also prevents our continuation from running since it is scheduled to run on the same thread that is blocking!
In other words, we are waiting on the UI thread for an operation to complete that cannot begin until the UI thread is finished, thus causing our deadlock.
By using concurrency::task_continuation_context::use_arbitrary() we tell the task that it's okay to use a background thread if necessary (which in this case it is) and everything completes as intended.
For documentation on this, as well as some example code illustrating async behavior, see the Creating Asynchronous Operations in C++ for UWP Apps documentation.

Accessing member variables within a boost::asio::spawned coroutine

I'm trying to add some async operations deep within an existing codebase, which is being called within a web server implemented using pion (which itself uses boost::asio).
The current code needs to continue operating in contexts where there is no io_service available, so I did the following, where Foo::bar is the main entry point of the existing codebase, and handleRequest is the pion request handler:
class Foo
{
public:
void bar(std::string input, boost::asio::io_service* io = NULL)
{
ioService = io;
if ( io == NULL )
{
barCommon(input);
}
else
{
boost::asio::spawn(*io, boost::bind(&Foo::barAsync, this, input, _1));
}
}
void barAsync(std::string input, boost::asio::yield_context yc)
{
barCommon(input, &yc);
}
void barCommon(std::string input, boost::asio::yield_context* yieldContext = NULL)
{
// Existing code here, with some operations performed async
// using ioService and yieldContext if they are not NULL.
}
private:
boost::asio::io_service* ioService;
// Other member variables, which cause a crash when accessed
}
void handleRequest(pion::http::request_ptr request, pion::tcp::connection_ptr connection)
{
Foo* foo = acquireFooPointer();
foo->bar(std::string(request->get_content()), &connection->get_io_service());
}
This seems to work insofar as it ends up running Foo::barCommon inside a coroutine, but the existing code crashes as soon as it tries to access Foo member variables. What am I missing here?
EDIT: Just to be clear, the pointer acquired in handleRequest is to a heap-allocated Foo object whose lifetime matches that of the server process.

threading-related active object design questions (c++ boost)

I would like some feedback regarding the IService class listed below. From what I know, this type of class is related to the "active-object" pattern. Please excuse/correct if I use any related terminology incorrectly. Basically the idea is that the classes using this active object class need to provide a start and a stop method which control some event loop. This event loop could be implemented with a while loop or with boost asio etc.
This class is responsible for starting a new thread in a non-blocking manner so that events can be handled in/by the new thread. It must also handle all clean-up related code. I first tried an OO approach in which subclasses were responsible for overriding methods to control the event loop but the cleanup was messy: in the destructor calling the stop method resulted in a pure virtual function call in cases where the calling class had not manually called the stop method. The templated solution seems to be a lot cleaner:
template <typename T>
class IService : private boost::noncopyable
{
typedef boost::shared_ptr<boost::thread> thread_ptr;
public:
IService()
{
}
~IService()
{
/// try stop the service in case it's running
stop();
}
void start()
{
boost::mutex::scoped_lock lock(m_threadMutex);
if (m_pServiceThread && m_pServiceThread->joinable())
{
// already running
return;
}
m_pServiceThread = thread_ptr(new boost::thread(boost::bind(&IService::main, this)));
// need to wait for thread to start: else if destructor is called before thread has started
// Wait for condition to be signaled and then
// try timed wait since the application could deadlock if the thread never starts?
//if (m_startCondition.timed_wait(m_threadMutex, boost::posix_time::milliseconds(getServiceTimeoutMs())))
//{
//}
m_startCondition.wait(m_threadMutex);
// notify main to continue: it's blocked on the same condition var
m_startCondition.notify_one();
}
void stop()
{
// trigger the stopping of the event loop
m_serviceObject.stop();
if (m_pServiceThread)
{
if (m_pServiceThread->joinable())
{
m_pServiceThread->join();
}
// the service is stopped so we can reset the thread
m_pServiceThread.reset();
}
}
private:
/// entry point of thread
void main()
{
boost::mutex::scoped_lock lock(m_threadMutex);
// notify main thread that it can continue
m_startCondition.notify_one();
// Try Dummy wait to allow 1st thread to resume???
m_startCondition.wait(m_threadMutex);
// call template implementation of event loop
m_serviceObject.start();
}
/// Service thread
thread_ptr m_pServiceThread;
/// Thread mutex
mutable boost::mutex m_threadMutex;
/// Condition for signaling start of thread
boost::condition m_startCondition;
/// T must satisfy the implicit service interface and provide a start and a stop method
T m_serviceObject;
};
The class could be used as follows:
class TestObject3
{
public:
TestObject3()
:m_work(m_ioService),
m_timer(m_ioService, boost::posix_time::milliseconds(200))
{
m_timer.async_wait(boost::bind(&TestObject3::doWork, this, boost::asio::placeholders::error));
}
void start()
{
// simple event loop
m_ioService.run();
}
void stop()
{
// signal end of event loop
m_ioService.stop();
}
void doWork(const boost::system::error_code& e)
{
// Do some work here
if (e != boost::asio::error::operation_aborted)
{
m_timer.expires_from_now( boost::posix_time::milliseconds(200) );
m_timer.async_wait(boost::bind(&TestObject3::doWork, this, boost::asio::placeholders::error));
}
}
private:
boost::asio::io_service m_ioService;
boost::asio::io_service::work m_work;
boost::asio::deadline_timer m_timer;
};
Now to my specific questions:
1) Is the use of the boost condition variable correct? It seems like a bit of a hack to me: I wanted to wait for the thread to be launched so I waited on the condition variable. Then once the new thread has launched in the main method, I again wait on the same condition variable to allow the initial thread to continue. Then once the start method of the initial thread is exited, the new thread can continue. Is this ok?
2) Are there any cases in which the thread would not get launched successfully by the OS? I remember reading somewhere that this can occur. If this is possible, I should rather do a timed wait on the condition variable (as is commented out in the start method)?
3) I am aware that of the templated class could not implement the stop method "correctly" i.e. if the event loop fails to stop, the code will block on the joins (either in the stop or in the destructor) but I see no way around this. I guess it is up to the user of the class to make sure that the start and stop method are implemented correctly?
4) I would appreciate any other design mistakes, improvements, etc?
Thanks!
Finally settled on the following:
1) After much testing use of condition variable seems fine
2) This issue hasn't cropped up (yet)
3) The templated class implementation must meet the requirements, unit tests are used to
test for correctness
4) Improvements
Added join with lock
Catching exceptions in spawned thread and rethrowing in main thread to avoid crashes and to not loose exception info
Using boost::system::error_code to communicate error codes back to caller
implementation object is set-able
Code:
template <typename T>
class IService : private boost::noncopyable
{
typedef boost::shared_ptr<boost::thread> thread_ptr;
typedef T ServiceImpl;
public:
typedef boost::shared_ptr<IService<T> > ptr;
IService()
:m_pServiceObject(&m_serviceObject)
{
}
~IService()
{
/// try stop the service in case it's running
if (m_pServiceThread && m_pServiceThread->joinable())
{
stop();
}
}
static ptr create()
{
return boost::make_shared<IService<T> >();
}
/// Accessor to service implementation. The handle can be used to configure the implementation object
ServiceImpl& get() { return m_serviceObject; }
/// Mutator to service implementation. The handle can be used to configure the implementation object
void set(ServiceImpl rServiceImpl)
{
// the implementation object cannot be modified once the thread has been created
assert(m_pServiceThread == 0);
m_serviceObject = rServiceImpl;
m_pServiceObject = &m_serviceObject;
}
void set(ServiceImpl* pServiceImpl)
{
// the implementation object cannot be modified once the thread has been created
assert(m_pServiceThread == 0);
// make sure service object is valid
if (pServiceImpl)
m_pServiceObject = pServiceImpl;
}
/// if the service implementation reports an error from the start or stop method call, it can be accessed via this method
/// NB: only the last error can be accessed
boost::system::error_code getServiceErrorCode() const { return m_ecService; }
/// The join method allows the caller to block until thread completion
void join()
{
// protect this method from being called twice (e.g. by user and by stop)
boost::mutex::scoped_lock lock(m_joinMutex);
if (m_pServiceThread && m_pServiceThread->joinable())
{
m_pServiceThread->join();
m_pServiceThread.reset();
}
}
/// This method launches the non-blocking service
boost::system::error_code start()
{
boost::mutex::scoped_lock lock(m_threadMutex);
if (m_pServiceThread && m_pServiceThread->joinable())
{
// already running
return boost::system::error_code(SHARED_INVALID_STATE, shared_category);
}
m_pServiceThread = thread_ptr(new boost::thread(boost::bind(&IService2::main, this)));
// Wait for condition to be signaled
m_startCondition.wait(m_threadMutex);
// notify main to continue: it's blocked on the same condition var
m_startCondition.notify_one();
// No error
return boost::system::error_code();
}
/// This method stops the non-blocking service
boost::system::error_code stop()
{
// trigger the stopping of the event loop
//boost::system::error_code ec = m_serviceObject.stop();
assert(m_pServiceObject);
boost::system::error_code ec = m_pServiceObject->stop();
if (ec)
{
m_ecService = ec;
return ec;
}
// The service implementation can return an error code here for more information
// However it is the responsibility of the implementation to stop the service event loop (if running)
// Failure to do so, will result in a block
// If this occurs in practice, we may consider a timed join?
join();
// If exception was thrown in new thread, rethrow it.
// Should the template implementation class want to avoid this, it should catch the exception
// in its start method and then return and error code instead
if( m_exception )
boost::rethrow_exception(m_exception);
return ec;
}
private:
/// runs in it's own thread
void main()
{
try
{
boost::mutex::scoped_lock lock(m_threadMutex);
// notify main thread that it can continue
m_startCondition.notify_one();
// Try Dummy wait to allow 1st thread to resume
m_startCondition.wait(m_threadMutex);
// call implementation of event loop
// This will block
// In scenarios where the service fails to start, the implementation can return an error code
m_ecService = m_pServiceObject->start();
m_exception = boost::exception_ptr();
}
catch (...)
{
m_exception = boost::current_exception();
}
}
/// Service thread
thread_ptr m_pServiceThread;
/// Thread mutex
mutable boost::mutex m_threadMutex;
/// Join mutex
mutable boost::mutex m_joinMutex;
/// Condition for signaling start of thread
boost::condition m_startCondition;
/// T must satisfy the implicit service interface and provide a start and a stop method
T m_serviceObject;
T* m_pServiceObject;
// Error code for service implementation errors
boost::system::error_code m_ecService;
// Exception ptr to transport exception across different threads
boost::exception_ptr m_exception;
};
Further feedback/criticism would of course be welcome.