Readable node stream to native c++ addon InputStream - c++

Conceptually what I'm trying to do is very simple. I have a Readable stream in node, and I'm passing that to a native c++ addon where I want to connect that to an IInputStream.
The native library that I'm using works like many c++ (or Java) streaming interfaces that I've seen. The library provides an IInputStream interface (technically an abstract class), which I inherit from and override the virtual functions. Looks like this:
class JsReadable2InputStream : public IInputStream {
public:
// Constructor takes a js v8 object, makes a stream out of it
JsReadable2InputStream(const v8::Local<v8::Object>& streamObj);
~JsReadable2InputStream();
/**
* Blocking read. Blocks until the requested amount of data has been read. However,
* if the stream reaches its end before the requested amount of bytes has been read
* it returns the number of bytes read thus far.
*
* #param begin memory into which read data is copied
* #param byteCount the requested number of bytes
* #return the number of bytes actually read. Is less than bytesCount iff
* end of stream has been reached.
*/
virtual int read(char* begin, const int byteCount) override;
virtual int available() const override;
virtual bool isActive() const override;
virtual void close() override;
private:
Nan::Persistent<v8::Object> _stream;
bool _active;
JsEventLoopSync _evtLoop;
};
Of these functions, the important one here is read. The native library will call this function when it wants more data, and the function must block until it is able to return the requested data (or the stream ends). Here's my implementation of read:
int JsReadable2InputStream::read(char* begin, const int byteCount) {
if (!this->_active) { return 0; }
int read = -1;
while (read < 0 && this->_active) {
this->_evtLoop.invoke(
(voidLambda)[this,&read,begin,byteCount](){
v8::Local<v8::Object> stream = Nan::New(this->_stream);
const v8::Local<v8::Function> readFn = Nan::To<v8::Function>(Nan::Get(stream, JS_STR("read")).ToLocalChecked()).ToLocalChecked();
v8::Local<v8::Value> argv[] = { Nan::New<v8::Number>(byteCount) };
v8::Local<v8::Value> result = Nan::Call(readFn, stream, 1, argv).ToLocalChecked();
if (result->IsNull()) {
// Somewhat hacky/brittle way to check if stream has ended, but it's the only option
v8::Local<v8::Object> readableState = Nan::To<v8::Object>(Nan::Get(stream, JS_STR("_readableState")).ToLocalChecked()).ToLocalChecked();
if (Nan::To<bool>(Nan::Get(readableState, JS_STR("ended")).ToLocalChecked()).ToChecked()) {
// End of stream, all data has been read
this->_active = false;
read = 0;
return;
}
// Not enough data available, but stream is still open.
// Set a flag for the c++ thread to go to sleep
// This is the case that it gets stuck in
read = -1;
return;
}
v8::Local<v8::Object> bufferObj = Nan::To<v8::Object>(result).ToLocalChecked();
int len = Nan::To<int32_t>(Nan::Get(bufferObj, JS_STR("length")).ToLocalChecked()).ToChecked();
char* buffer = node::Buffer::Data(bufferObj);
if (len < byteCount) {
this->_active = false;
}
// copy the data out of the buffer
if (len > 0) {
std::memcpy(begin, buffer, len);
}
read = len;
}
);
if (read < 0) {
// Give js a chance to read more data
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
return read;
}
The idea is, the c++ code keeps a reference to the node stream object. When the native code wants to read, it has to synchronize with the node event loop, then attempt to invoke read on the node stream. If the node stream returns null, this indicates that the data isn't ready, so the native thread sleeps, giving the node event loop thread a chance to run and fill its buffers.
This solution works perfectly for a single stream, or even 2 or 3 streams running in parallel. Then for some reason when I hit the magical number of 4+ parallel streams, this totally deadlocks. None of the streams can successfully read any bytes at all. The above while loop runs infinitely, with the call into the node stream returning null every time.
It is behaving as though node is getting starved, and the streams never get a chance to populate with data. However, I've tried adjusting the sleep duration (to much larger values, and randomized values) and that had no effect. It is also clear that the event loop continues to run, since my lambda function continues to get executed there (I put some printfs inside to confirm this).
Just in case it might be relevant (I don't think it is), I'm also including my implementation of JsEventLoopSync. This uses libuv to schedule a lambda to be executed on the node event loop. It is designed such that only one can be scheduled at a time, and other invocations must wait until the first completes.
#include <nan.h>
#include <functional>
// simplified type declarations for the lambda functions
using voidLambda = std::function<void ()>;
// Synchronize with the node v8 event loop. Invokes a lambda function on the event loop, where access to js objects is safe.
// Blocks execution of the invoking thread until execution of the lambda completes.
class JsEventLoopSync {
public:
JsEventLoopSync() : _destroyed(false) {
// register on the default (same as node) event loop, so that we can execute callbacks in that context
// This takes a function pointer, which only works with a static function
this->_handles = new async_handles_t();
this->_handles->inst = this;
uv_async_init(uv_default_loop(), &this->_handles->async, JsEventLoopSync::_processUvCb);
// mechanism for passing this instance through to the native uv callback
this->_handles->async.data = this->_handles;
// mutex has to be initialized
uv_mutex_init(&this->_handles->mutex);
uv_cond_init(&this->_handles->cond);
}
~JsEventLoopSync() {
uv_mutex_lock(&this->_handles->mutex);
// prevent access to deleted instance by callback
this->_destroyed = true;
uv_mutex_unlock(&this->_handles->mutex);
// NOTE: Important, this->_handles must be a dynamically allocated pointer because uv_close() is
// async, and still has a reference to it. If it were statically allocated as a class member, this
// destructor would free the memory before uv_close was done with it (leading to asserts in libuv)
uv_close(reinterpret_cast<uv_handle_t*>(&this->_handles->async), JsEventLoopSync::_asyncClose);
}
// called from the native code to invoke the function
void invoke(const voidLambda& fn) {
if (v8::Isolate::GetCurrent() != NULL) {
// Already on the event loop, process now
return fn();
}
// Need to sync with the event loop
uv_mutex_lock(&this->_handles->mutex);
if (this->_destroyed) { return; }
this->_fn = fn;
// this will invoke processUvCb, on the node event loop
uv_async_send(&this->_handles->async);
// wait for it to complete processing
uv_cond_wait(&this->_handles->cond, &this->_handles->mutex);
uv_mutex_unlock(&this->_handles->mutex);
}
private:
// pulls data out of uv's void* to call the instance method
static void _processUvCb(uv_async_t* handle) {
if (handle->data == NULL) { return; }
auto handles = static_cast<async_handles_t*>(handle->data);
handles->inst->_process();
}
inline static void _asyncClose(uv_handle_t* handle) {
auto handles = static_cast<async_handles_t*>(handle->data);
handle->data = NULL;
uv_mutex_destroy(&handles->mutex);
uv_cond_destroy(&handles->cond);
delete handles;
}
// Creates the js arguments (populated by invoking the lambda), then invokes the js function
// Invokes resultLambda on the result
// Must be run on the node event loop!
void _process() {
if (v8::Isolate::GetCurrent() == NULL) {
// This is unexpected!
throw std::logic_error("Unable to sync with node event loop for callback!");
}
uv_mutex_lock(&this->_handles->mutex);
if (this->_destroyed) { return; }
Nan::HandleScope scope; // looks unused, but this is very important
// invoke the lambda
this->_fn();
// signal that we're done
uv_cond_signal(&this->_handles->cond);
uv_mutex_unlock(&this->_handles->mutex);
}
typedef struct async_handles {
uv_mutex_t mutex;
uv_cond_t cond;
uv_async_t async;
JsEventLoopSync* inst;
} async_handles_t;
async_handles_t* _handles;
voidLambda _fn;
bool _destroyed;
};
So, what am I missing? Is there a better way to wait for the node thread to get a chance to run? Is there a totally different design pattern that would work better? Does node have some upper limit on the number of streams that it can process at once?

As it turns out, the problems that I was seeing were actually client-side limitations. Browsers (and seemingly also node) have a limit on the number of open TCP connections to the same origin. I worked around this by spawning multiple node processes to do my testing.
If anyone is trying to do something similar, the code I shared is totally viable. If I ever have some free time, I might make it into a library.

Related

C++ GRPC ClientAsyncReaderWriter: how to check if data is available for read?

I have bidirectional streaming async grpc client that use ClientAsyncReaderWriter for communication with server. RPC code looks like:
rpc Process (stream Request) returns (stream Response)
For simplicity Request and Response are bytes arrays (byte[]). I send several chunks of data to server, and when server accumulate enough data, server process this data and send back the response and continue accumulating data for next responses. After several responses, the server send final response and close connection.
For async client I using CompletionQueue. Code looks like:
...
CompletionQueue cq;
std::unique_ptr<Stub> stub;
grpc::ClientContext context;
std::unique_ptr<grpc::ClientAsyncReaderWriter<Request,Response>> responder = stub->AsyncProcess(&context, &cq, handler);
// thread for completition queue
std::thread t(
[]{
void *handler = nullptr;
bool ok = false;
while (cq_.Next(&handler, &ok)) {
if (can_read) {
// how do you know that it is read data available
// Do read
} else {
// do write
...
Request request = prepare_request();
responder_->Write(request, handler);
}
}
}
);
...
// wait
What is the proper way to async reading? Can I try to read if it no data available? Is it blocking call?
Sequencing Read() calls
Can I try to read if it no data available?
Yep, and it's going to be case more often than not. Read() will do nothing until data is available, and only then put its passed tag into the completion queue. (see below for details)
Is it blocking call?
Nope. Read() and Write() return immediately. However, you can only have one of each in flight at any given moment. If you try to send a second one before the previous has completed, it (the second one) will fail.
What is the proper way to async reading?
Each time a Read() is done, start a new one. For that, you need to be able to tell when a Read() is done. This is where tags come in!
When you call Read(&msg, tag), or Write(request, tag),you are telling grpc to put tag in the completion queue associated with that responder once that operation has completed. grpc doesn't care what the tag is, it just hands it off.
So the general strategy you will want to go for is:
As soon as you are ready to start receiving messages:
call responder->Read() once with some tag that you will recognize as a "read done".
Whenever cq_.Next() gives you back that tag, and ok == true:
consume the message
Queue up a new responder->Read() with that same tag.
Obviously, you'll also want to do something similar for your calls to Write().
But since you still want to be able to lookup the handler instance from a given tag, you'll need a way to pack a reference to the handler as well as information about which operation is being finished in a single tag.
Completion queues
Lookup the handler instance from a given tag? Why?
The true raison d'ĂȘtre of completion queues is unfortunately not evident from the examples. They allow multiple asynchronous rpcs to share the same thread. Unless your application only ever makes a single rpc call, the handling thread should not be associated with a specific responder. Instead, that thread should be a general-purpose worker that dispatches events to the correct handler based on the content of the tag.
The official examples tend to do that by using pointer to the handler object as the tag. That works when there's a specific sequence of events to expect since you can easily predict what a handler is reacting to. You often can't do that with async bidirectional streams, since any given completion event could be a Read() or a Write() finishing.
Example
Here's a general outline of what I personally consider to be a clean way to go about all that:
// Base class for async bidir RPCs handlers.
// This is so that the handling thread is not associated with a specific rpc method.
class RpcHandler {
// This will be used as the "tag" argument to the various grpc calls.
struct TagData {
enum class Type {
start_done,
read_done,
write_done,
// add more as needed...
};
RpcHandler* handler;
Type evt;
};
struct TagSet {
TagSet(RpcHandler* self)
: start_done{self, TagData::Type::start_done},
read_done{self, TagData::Type::read_done},
write_done{self, TagData::Type::write_done} {}
TagData start_done;
TagData read_done;
TagData write_done;
};
public:
RpcHandler() : tags(this) {}
virtual ~RpcHandler() = default;
// The actual tag objects we'll be passing
TagSet tags;
virtual void on_ready() = 0;
virtual void on_recv() = 0;
virtual void on_write_done() = 0;
static void handling_thread_main(grpc::CompletionQueue* cq) {
void* raw_tag = nullptr;
bool ok = false;
while (cq->Next(&raw_tag, &ok)) {
TagData* tag = reinterpret_cast<TagData*>(raw_tag);
if(!ok) {
// Handle error
}
else {
switch (tag->evt) {
case TagData::Type::start_done:
tag->handler->on_ready();
break;
case TagData::Type::read_done:
tag->handler->on_recv();
break;
case TagData::Type::write_done:
tag->handler->on_write_done();
break;
}
}
}
}
};
void do_something_with_response(Response const&);
class MyHandler final : public RpcHandler {
public:
using responder_ptr =
std::unique_ptr<grpc::ClientAsyncReaderWriter<Request, Response>>;
MyHandler(responder_ptr responder) : responder_(std::move(responder)) {
// This lock is needed because StartCall() can
// cause the handler thread to access the object.
std::lock_guard lock(mutex_);
responder_->StartCall(&tags.start_done);
}
~MyHandler() {
// TODO: finish/abort the streaming rpc as appropriate.
}
void send(const Request& msg) {
std::lock_guard lock(mutex_);
if (!sending_) {
sending_ = true;
responder_->Write(msg, &tags.write_done);
} else {
// TODO: add some form of synchronous wait, or outright failure
// if the queue starts to get too big.
queued_msgs_.push(msg);
}
}
private:
// When the rpc is ready, queue the first read
void on_ready() override {
std::lock_guard l(mutex_); // To synchronize with the constructor
responder_->Read(&incoming_, &tags.read_done);
};
// When a message arrives, use it, and start reading the next one
void on_recv() override {
// incoming_ never leaves the handling thread, so no need to lock
// ------ If handling is cheap and stays in the handling thread.
do_something_with_response(incoming_);
responder_->Read(&incoming_, &tags.read_done);
// ------ If responses is expensive or involves another thread.
// Response msg = std::move(incoming_);
// responder_->Read(&incoming_, &tags.read_done);
// do_something_with_response(msg);
};
// When has been sent, send the next one is there is any
void on_write_done() override {
std::lock_guard lock(mutex_);
if (!queued_msgs_.empty()) {
responder_->Write(queued_msgs_.front(), &tags.write_done);
queued_msgs_.pop();
} else {
sending_ = false;
}
};
responder_ptr responder_;
// Only ever touched by the handler thread post-construction.
Response incoming_;
bool sending_ = false;
std::queue<Request> queued_msgs_;
std::mutex mutex_; // grpc might be thread-safe, MyHandler isn't...
};
int main() {
// Start the thread as soon as you have a completion queue.
auto cq = std::make_unique<grpc::CompletionQueue>();
std::thread t(RpcHandler::handling_thread_main, cq.get());
// Multiple concurent RPCs sharing the same handling thread:
MyHandler handler1(serviceA->MethodA(&context, cq.get()));
MyHandler handler2(serviceA->MethodA(&context, cq.get()));
MyHandlerB handler3(serviceA->MethodB(&context, cq.get()));
MyHandlerC handler4(serviceB->MethodC(&context, cq.get()));
}
If you have a keen eye, you will notice that the code above stores a bunch (1 per event type) of redundant this pointers in the handler. It's generally not a big deal, but it is possible to do without them via multiple inheritance and downcasting, but that's starting to be somewhat beyond the scope of this question.

Thread stops running after random time at random position without any error

I have a thread that dumps images as raw data to disk. It works fine for a few minutes and then suddenly it just stops doing anything.
Through command line output I found that it stops at random positions within the loop.
The program doesn't crash within this thread (it crashes shortly after the thread stops running because my image buffer gets full), so no error/exception/anything from the thread.
Here's a sketch of my code:
class ImageWriter
{
public:
// constructor, destructor
void continueWriting();
private:
void writeImages();
std::thread m_WriterThread;
bool m_WriterThreadRunning;
std::mutex m_ThreadRunningMutex;
ImageManager * m_ImageManager;
};
ImageWriter::continueWriting()
{
// whenever a new image is acquired, this function is called
// so if the thread has finished, it needs to be restarted
// this function is also used for the first start of writing
m_ThreadRunningMutex.lock();
if ( m_WriterThreadRunning )
{
m_ThreadRunningMutex.unlock();
}
else
{
m_ThreadRunningMutex.unlock();
if( m_WriterThread.joinable() )
{
m_WriterThread.join();
}
m_WriterThreadRunning = true;
m_WriterThread = std::thread( &ImageWriter::writeImages, this );
}
}
void ImageWriter::writeImages()
{
while ( true )
{
// MyImage is a struct that contains the image pointer and some metadata
std::shared_ptr< MyImage > imgPtr = m_ImageManager->getNextImage(m_uiCamId);
if( imgPtr == nullptr )
{
// this tells the ImageWriter that currently there are no further images queued
break;
}
// check whether the image is valid. If it's not, skip this image and continue with the next one
[...]
// create filename
std::stringstream cFileNameStr;
cFileNameStr << [...];
std::ofstream cRawFile( cFileNameStr.str().c_str(), std::ios::out | std::ios::binary );
unsigned char * ucDataPtr = imgPtr->cImgPtr;
if( cRawFile.is_open() )
{
// calculate file size
unsigned int uiFileSize = [...];
cRawFile.write(reinterpret_cast<char*>(ucDataPtr), uiFileSize);
cRawFile.close();
}
// dump some metadata into a singleton class for logging
[...]
}
m_ThreadRunningMutex.lock();
m_WriterThreadRunning = false;
m_ThreadRunningMutex.unlock();
}
ImageManager is a class that takes care of image acquisition and queues the acquired images. It also triggers continueWriting(). The continueWriting() mechanism is necessary, as images may be written faster than they are acquired.
Why does this thread stop running at random times at random positions and without any error?
Valgrind doesn't yield anything within my control.
I tried setting the thread's priority up, but that didn't make any difference.
I also tried another disk, but that didn't make any difference either.
I noticed you're immediately unlocking the thread in both branches. Since all you're doing is reading a bool, you probably should avoid using locks entirely. Reading is not usually an operation that needs synchronization (unless it has side effects, such as reading a stream, or the location is deallocated, etc)
Consider: You will never read a True value from that bool before it's true and since all you do is read, you'll never run the risk of that function assigning an incorrect value to that bool. You don't assign a new value to the bool here until after you've already joined your thread.
I'd assume what's happening here is that your code locks the mutex, and another thread tries to write to it, but cannot since it's locked.

Delete an object after the callback is called C++

I create a new object and set a data and a callback something like this:
class DownloadData
{
std::function<void(int, bool)> m_callback;
int m_data;
public:
void sendHttpRequest()
{
// send request with data
}
private:
void getHttpResponse(int responseCode)
{
if (responseCode == 0)
{
// save data
m_callback(responseCode, true);
delete this;
return;
}
// some processing here
if (responseCode == 1 && some other condition here)
{
m_callback(responseCode, false);
delete this;
return;
}
}
}
Now the usage - I create a new object:
if (isNeededToDownloadTheFile)
{
DownloadData* p = new DownloadData(15, [](){});
p->sendHttpRequest();
}
But as you can see https://isocpp.org/wiki/faq/freestore-mgmt#delete-this it is highly not desirable to make a suicide. Is there a good design pattern or an approach for this?
You could put them in a vector or list, have getHttpResponse() set a flag instead of delete this when it's completed, and then have another part of the code occasionally traverse the list looking for completed requests.
That would also allow you to implement a timeout. If the request hasn't returned in a day, it's probably not going to and you should delete that object.
If you want to put the delete out of that function, the only way is to store the object somehow. However, this raises the ownership questions: who is the owner of the asynchronous http request that's supposed to call a callback?
In this scenario, doing the GCs job actually makes the code pretty clear. However, if you wanted to make it more adaptable to C++, I'd probably settle on a promise-like interface, similar to std::async. That way the synchronous code path makes it way easier to store the promise objects.
You asked for a code example, so there goes:
Typical approach would look like this:
{
DownloadData* p = new DownloadData(15, [](auto data){
print(data)
});
p->sendHttpRequest();
}
Once the data is available, it can be printed. However, you can look at the problem "from the other end":
{
Future<MyData> f = DownloadData(15).getFuture();
// now you can either
// a) synchronously wait for the future
// b) return it for further processing
return f;
}
f will hold the actual value once the request actually processes. That way you can push it as if it was a regular value all the way up to the place where that value is actually needed, and wait for it there. Of course, if you consume it asynchronously, you might as well spawn another asynchronous action for that.
The implementation of the Future is something that's outside of the scope of this answer, I think, but then again numerous resources are available online. The concept of Promises and Futures isn't something specific to C++.
If the caller keeps a reference to the downloading object then it can erase it when the download signals it has ended:
class DownloadData
{
// true until download stops (atomic to prevent race)
std::atomic_bool m_downloading;
int m_data;
std::function<void(int, bool)> m_callback;
public:
DownloadData(int data, std::function<void(int, bool)> callback)
: m_downloading(true), m_data(data), m_callback(callback) {}
void sendHttpRequest()
{
// send request with data
}
// called asynchronously to detect dead downloads
bool ended() const { return !m_downloading; }
private:
void getHttpResponse(int responseCode)
{
if (responseCode == 0)
{
// save data
m_callback(responseCode, true);
m_downloading = false; // signal end
return;
}
// some processing here
if(responseCode == 1)
{
m_callback(responseCode, false);
m_downloading = false; // signal end
return;
}
}
};
Then from the caller's side:
std::vector<std::unique_ptr<DownloadData>> downloads;
// ... other code ...
if (isNeededToDownloadTheFile)
{
// clean current downloads by deleting all those
// whose download is ended
downloads.erase(std::remove_if(downloads.begin(), downloads.end(),
[](std::unique_ptr<DownloadData> const& d)
{
return d->ended();
}), downloads.end());
// store this away to keep it alive until its download ends
downloads.push_back(std::make_unique<DownloadData>(15, [](int, bool){}));
downloads.back()->sendHttpRequest();
}
// ... etc ...

How to properly delete a pointer to callback function.

I have a MainProgram.exe which calls in to MyDll.dll and uses curl to receive data on a callback function.
I have wrapped curl in a function called CurlGetData which creates a curl instance and performs curl_easy_perform.
Here is my code:
//Interface class to derive from
class ICurlCallbackHandler
{
public:
virtual size_t CurlDataCallback( void* pData, size_t tSize ) = 0;
};
//Class that implements interface
class CurlCallbackHandler : public ICurlCallbackHandler
{
public:
bool m_exit = false;
virtual size_t CurlDataCallback( void* pData, size_t tSize ) override
{
if(m_exit)
return CURL_READFUNC_ABORT;
// do stuff with the curl data
return tSize;
}
}
CurlCallbackHandler *m_curlHandler;
//Create an instance of above class in my dll constructor
MyDll:MyDll()
{
m_curlHandler = new CurlCallbackHandler();
}
//Cleanup above class in my dll destructor
MyDll:~MyDll()
{
delete m_curlHandler;
m_curlHandler = nullptr;
}
//Function to start receiving data asynchronously
void MyDll::GetDataAsync()
{
std::async([=]
{
//This will receive data in a new thread and call CurlDataCallback above
//This basically calls easy_perform
CurlGetData(m_curlHandler);
}
}
//Will cause the curl callback to return CURL_READFUNC_ABORT
void MyDll::StopDataAsync()
{
m_curlHandler->m_exit = true;
}
The function GetDataAsync is called from my main program and it basically calls curl_easy_perform and uses the m_curlHandler as its callback function which calls back up into CurlDataCallback.
This all works fine but whenever my main program exits, it calls MyDll::StopDataAsync which stops the curl data callback and then the destructor of MyDll is called which cleans up the m_curlHandler.
But I find that at that moment curl has not yet finished with this call back and the program crashes as m_curlHandler has been deleted but the curl callback in the new async thread still is using it.
Sometimes it closes down fine but other times it crashes due to the curlcallback trying to access a pointer that has been deleted by the destructor.
How can I best clean up the m_curlHandler? I want to avoid putting in wait time-outs as this this will affect the performance of my main program.
According to the C++ standard the MyDll::GetDataAsync() function should not return immediately, it should block until the asynchronous thread has finished, which would effectively make the operation synchronous. However I believe Microsoft intentionally violated this part of the std::async specification, so actually it does return immediately, and it's possible for you to destroy the callback while the async thread is still using it (which is exactly the problem that would be avoided if the Microsoft implementation followed the standard!)
The solution is to keep hold of the std::future that std::async returns, and then block on that future (which ensures the async thread has finished) before destroying the callback.
class MyDLL
{
std::future<void> m_future;
...
};
MyDll:~MyDll()
{
StopDataAsync();
m_future.get(); // wait for async thread to exit.
delete m_curlHandler; // now it's safe to do this
}
//Function to start receiving data asynchronously
void MyDll::GetDataAsync()
{
m_future = std::async([=]
{
//This will receive data in a new thread and call CurlDataCallback above
//This basically calls easy_perform
CurlGetData(m_curlHandler);
}
}
N.B. your m_exit member should be std::atomic<bool> (or you should use a mutex to protect all reads and writes to it) otherwise your program has a data race and so has undefined behaviour.
I would also use std::unique_ptr<CurlCallbackHandler> for m_curlHandler.
I want to avoid putting in wait time-outs as this this will affect the performance of my main program.
The solution above will cause your destructor to wait, but only for as long as it takes for the callback to notice that m_exit == true and cause the async thread to stop running. That means you only wait as long as necessary and no longer, unlike time-outs which would mean guessing how long is "long enough", and then probably adding a bit more to be safe.

Mutex Safety with Interrupts (Embedded Firmware)

Edit #Mike pointed out that my try_lock function in the code below is unsafe and that accessor creation can produce a race condition as well. The suggestions (from everyone) have convinced me that I'm going down the wrong path.
Original Question
The requirements for locking on an embedded microcontroller are different enough from multithreading that I haven't been able to convert multithreading examples to my embedded applications. Typically I don't have an OS or threads of any kind, just main and whatever interrupt functions are called by the hardware periodically.
It's pretty common that I need to fill up a buffer from an interrupt, but process it in main. I've created the IrqMutex class below to try to safely implement this. Each person trying to access the buffer is assigned a unique id through IrqMutexAccessor, then they each can try_lock() and unlock(). The idea of a blocking lock() function doesn't work from interrupts because unless you allow the interrupt to complete, no other code can execute so the unlock() code never runs. I do however use a blocking lock from the main() code occasionally.
However, I know that the double-check lock doesn't work without C++11 memory barriers (which aren't available on many embedded platforms). Honestly despite reading quite a bit about it, I don't really understand how/why the memory access reordering can cause a problem. I think that the use of volatile sig_atomic_t (possibly combined with the use of unique IDs) makes this different from the double-check lock. But I'm hoping someone can: confirm that the following code is correct, explain why it isn't safe, or offer a better way to accomplish this.
class IrqMutex {
friend class IrqMutexAccessor;
private:
std::sig_atomic_t accessorIdEnum;
volatile std::sig_atomic_t owner;
protected:
std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
bool have_lock(std::sig_atomic_t accessorId) {
return (owner == accessorId);
}
bool try_lock(std::sig_atomic_t accessorId) {
// Only try to get a lock, while it isn't already owned.
while (owner == SIG_ATOMIC_MIN) {
// <-- If an interrupt occurs here, both attempts can get a lock at the same time.
// Try to take ownership of this Mutex.
owner = accessorId; // SET
// Double check that we are the owner.
if (owner == accessorId) return true;
// Someone else must have taken ownership between CHECK and SET.
// If they released it after CHECK, we'll loop back and try again.
// Otherwise someone else has a lock and we have failed.
}
// This shouldn't happen unless they called try_lock on something they already owned.
if (owner == accessorId) return true;
// If someone else owns it, we failed.
return false;
}
bool unlock(std::sig_atomic_t accessorId) {
// Double check that the owner called this function (not strictly required)
if (owner == accessorId) {
owner = SIG_ATOMIC_MIN;
return true;
}
// We still return true if the mutex was unlocked anyway.
return (owner == SIG_ATOMIC_MIN);
}
public:
IrqMutex(void) : accessorIdEnum(SIG_ATOMIC_MIN), owner(SIG_ATOMIC_MIN) {}
};
// This class is used to manage our unique accessorId.
class IrqMutexAccessor {
friend class IrqMutex;
private:
IrqMutex& mutex;
const std::sig_atomic_t accessorId;
public:
IrqMutexAccessor(IrqMutex& m) : mutex(m), accessorId(m.nextAccessor()) {}
bool have_lock(void) { return mutex.have_lock(accessorId); }
bool try_lock(void) { return mutex.try_lock(accessorId); }
bool unlock(void) { return mutex.unlock(accessorId); }
};
Because there is one processor, and no threading the mutex serves what I think is a subtly different purpose than normal. There are two main use cases I run into repeatedly.
The interrupt is a Producer and takes ownership of a free buffer and loads it with a packet of data. The interrupt/Producer may keep its ownership lock for a long time spanning multiple interrupt calls. The main function is the Consumer and takes ownership of a full buffer when it is ready to process it. The race condition rarely happens, but if the interrupt/Producer finishes with a packet and needs a new buffer, but they are all full it will try to take the oldest buffer (this is a dropped packet event). If the main/Consumer started to read and process that oldest buffer at exactly the same time they would trample all over each other.
The interrupt is just a quick change or increment of something (like a counter). However, if we want to reset the counter or jump to some new value with a call from the main() code we don't want to try to write to the counter as it is changing. Here main actually does a blocking loop to obtain a lock, however I think its almost impossible to have to actually wait here for more than two attempts. Once it has a lock, any calls to the counter interrupt will be skipped, but that's generally not a big deal for something like a counter. Then I update the counter value and unlock it so it can start incrementing again.
I realize these two samples are dumbed down a bit, but some version of these patterns occur in many of the peripherals in every project I work on and I'd like once piece of reusable code that can safely handle this across various embedded platforms. I included the C tag, because all of this is directly convertible to C code, and on some embedded compilers that's all that is available. So I'm trying to find a general method that is guaranteed to work in both C and C++.
struct ExampleCounter {
volatile long long int value;
IrqMutex mutex;
} exampleCounter;
struct ExampleBuffer {
volatile char data[256];
volatile size_t index;
IrqMutex mutex; // One mutex per buffer.
} exampleBuffers[2];
const volatile char * const REGISTER;
// This accessor shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutex(exampleCounter.mutex);
void __irqQuickFunction(void) {
// Obtain a lock, add the data then unlock all within one function call.
if (myMutex.try_lock()) {
exampleCounter.value++;
myMutex.unlock();
} else {
// If we failed to obtain a lock, we skipped this update this one time.
}
}
// These accessors shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutexes[2] = {
IrqMutexAccessor(exampleBuffers[0].mutex),
IrqMutexAccessor(exampleBuffers[1].mutex)
};
void __irqLongFunction(void) {
static size_t bufferIndex = 0;
// Check if we have a lock.
if (!myMutex[bufferIndex].have_lock() and !myMutex[bufferIndex].try_lock()) {
// If we can't get a lock try the other buffer
bufferIndex = (bufferIndex + 1) % 2;
// One buffer should always be available so the next line should always be successful.
if (!myMutex[bufferIndex].try_lock()) return;
}
// ... at this point we know we have a lock ...
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
exampleBuffers[bufferIndex].data[exampleBuffers[bufferIndex].index++] = c;
// We may keep the lock for multiple function calls until the end of packet.
static const char END_PACKET_SIGNAL = '\0';
if (c == END_PACKET_SIGNAL) {
// Unlock this buffer so it can be read from main.
myMutex[bufferIndex].unlock();
// Switch to the other buffer for next time.
bufferIndex = (bufferIndex + 1) % 2;
}
}
int main(void) {
while (true) {
// Mutex for counter
static IrqMutexAccessor myCounterMutex(exampleCounter.mutex);
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
// Skip any updates that occur while we are updating the counter.
while(!myCounterMutex.try_lock()) {
// Wait for the interrupt to release its lock.
}
// Set the counter to a new value.
exampleCounter.value = 500;
// Updates will start again as soon as we unlock it.
myCounterMutex.unlock();
}
// Mutexes for __irqLongFunction.
static IrqMutexAccessor myBufferMutexes[2] = {
IrqMutexAccessor(exampleBuffers[0].mutex),
IrqMutexAccessor(exampleBuffers[1].mutex)
};
// Process buffers from __irqLongFunction.
for (size_t i = 0; i < 2; i++) {
// Obtain a lock so we can read the data.
if (!myBufferMutexes[i].try_lock()) continue;
// Check that the buffer isn't empty.
if (exampleBuffers[i].index == 0) {
myBufferMutexes[i].unlock(); // Don't forget to unlock.
continue;
}
// ... read and do something with the data here ...
exampleBuffer.index = 0;
myBufferMutexes[i].unlock();
}
}
}
}
Also note that I used volatile on any variable that is read-by or written-by the interrupt routine (unless the variable was only accessed from the interrupt like the static bufferIndex value in __irqLongFunction). I've read that mutexes remove some of need for volatile in multithreaded code, but I don't think that applies here. Did I use the right amount of volatile? I used it on: ExampleBuffer[].data[256], ExampleBuffer[].index, and ExampleCounter.value.
I apologize for the long answer, but perhaps it is fitting for a long question.
To answer your first question, I would say that your implementation of IrqMutex is not safe. Let me try to explain where I see problems.
Function nextAccessor
std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
This function has a race condition, because the increment operator is not atomic, despite it being on an atomic value marked volatile. It involves 3 operations: reading the current value of accessorIdEnum, incrementing it, and writing the result back. If two IrqMutexAccessors are created at the same time, it's possible that they both get the same ID.
Function try_lock
The try_lock function also has a race condition. One thread (eg main), could go into the while loop, and then before taking ownership, another thread (eg an interrupt) can also go into the while loop and take ownership of the lock (returning true). Then the first thread can continue, moving onto owner = accessorId, and thus "also" take the lock. So two threads (or your main thread and an interrupt) can try_lock on an unowned mutex at the same time and both return true.
Disabling interrupts by RAII
We can achieve some level of simplicity and encapsulation by using RAII for interrupt disabling, for example the following class:
class InterruptLock {
public:
InterruptLock() {
prevInterruptState = currentInterruptState();
disableInterrupts();
}
~InterruptLock() {
restoreInterrupts(prevInterruptState);
}
private:
int prevInterruptState; // Whatever type this should be for the platform
InterruptLock(const InterruptLock&); // Not copy-constructable
};
And I would recommend disabling interrupts to get the atomicity you need within the mutex implementation itself. For example something like:
bool try_lock(std::sig_atomic_t accessorId) {
InterruptLock lock;
if (owner == SIG_ATOMIC_MIN) {
owner = accessorId;
return true;
}
return false;
}
bool unlock(std::sig_atomic_t accessorId) {
InterruptLock lock;
if (owner == accessorId) {
owner = SIG_ATOMIC_MIN;
return true;
}
return false;
}
Depending on your platform, this might look different, but you get the idea.
As you said, this provides a platform to abstract away from the disabling and enabling interrupts in general code, and encapsulates it to this one class.
Mutexes and Interrupts
Having said how I would consider implementing the mutex class, I would not actually use a mutex class for your use-cases. As you pointed out, mutexes don't really play well with interrupts, because an interrupt can't "block" on trying to acquire a mutex. For this reason, for code that directly exchanges data with an interrupt, I would instead strongly consider just directly disabling interrupts (for a very short time while the main "thread" touches the data).
So your counter might simply look like this:
volatile long long int exampleCounter;
void __irqQuickFunction(void) {
exampleCounter++;
}
...
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
InterruptLock lock;
exampleCounter = 500;
}
In my mind, this is easier to read, easier to reason about, and won't "slip" when there's contention (ie miss timer beats).
Regarding the buffer use-case, I would strongly recommend against holding a lock for multiple interrupt cycles. A lock/mutex should be held for just the slightest moment required to "touch" a piece of memory - just long enough to read or write it. Get in, get out.
So this is how the buffering example might look:
struct ExampleBuffer {
char data[256];
} exampleBuffers[2];
ExampleBuffer* volatile bufferAwaitingConsumption = nullptr;
ExampleBuffer* volatile freeBuffer = &exampleBuffers[1];
const volatile char * const REGISTER;
void __irqLongFunction(void) {
static const char END_PACKET_SIGNAL = '\0';
static size_t index = 0;
static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
receiveBuffer->data[index++] = c;
// End of packet?
if (c == END_PACKET_SIGNAL) {
// Make the packet available to the consumer
bufferAwaitingConsumption = receiveBuffer;
// Move on to the next buffer
receiveBuffer = freeBuffer;
freeBuffer = nullptr;
index = 0;
}
}
int main(void) {
while (true) {
// Fetch packet from shared variable
ExampleBuffer* packet;
{
InterruptLock lock;
packet = bufferAwaitingConsumption;
bufferAwaitingConsumption = nullptr;
}
if (packet) {
// ... read and do something with the data here ...
// Once we're done with the buffer, we need to release it back to the producer
{
InterruptLock lock;
freeBuffer = packet;
}
}
}
}
This code is arguably easier to reason about, since there are only two memory locations shared between the interrupt and the main loop: one to pass packets from the interrupt to the main loop, and one to pass empty buffers back to the interrupt. We also only touch those variables under "lock", and only for the minimum time needed to "move" the value. (for simplicity I've skipped over the buffer overflow logic when the main loop takes too long to free the buffer).
It's true that in this case one may not even need the locks, since we're just reading and writing simple value, but the cost of disabling the interrupts is not much, and the risk of making mistakes otherwise, is not worth it in my opinion.
Edit
As pointed out in the comments, the above solution was meant to only tackle the multithreading problem, and omitted overflow checking. Here is more complete solution which should be robust under overflow conditions:
const size_t BUFFER_COUNT = 2;
struct ExampleBuffer {
char data[256];
ExampleBuffer* next;
} exampleBuffers[BUFFER_COUNT];
volatile size_t overflowCount = 0;
class BufferList {
public:
BufferList() : first(nullptr), last(nullptr) { }
// Atomic enqueue
void enqueue(ExampleBuffer* buffer) {
InterruptLock lock;
if (last)
last->next = buffer;
else {
first = buffer;
last = buffer;
}
}
// Atomic dequeue (or returns null)
ExampleBuffer* dequeueOrNull() {
InterruptLock lock;
ExampleBuffer* result = first;
if (first) {
first = first->next;
if (!first)
last = nullptr;
}
return result;
}
private:
ExampleBuffer* first;
ExampleBuffer* last;
} freeBuffers, buffersAwaitingConsumption;
const volatile char * const REGISTER;
void __irqLongFunction(void) {
static const char END_PACKET_SIGNAL = '\0';
static size_t index = 0;
static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
// Recovery from overflow?
if (!receiveBuffer) {
// Try get another free buffer
receiveBuffer = freeBuffers.dequeueOrNull();
// Still no buffer?
if (!receiveBuffer) {
overflowCount++;
return;
}
}
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
if (index < sizeof(receiveBuffer->data))
receiveBuffer->data[index++] = c;
// End of packet, or out of space?
if (c == END_PACKET_SIGNAL) {
// Make the packet available to the consumer
buffersAwaitingConsumption.enqueue(receiveBuffer);
// Move on to the next free buffer
receiveBuffer = freeBuffers.dequeueOrNull();
index = 0;
}
}
size_t getAndResetOverflowCount() {
InterruptLock lock;
size_t result = overflowCount;
overflowCount = 0;
return result;
}
int main(void) {
// All buffers are free at the start
for (int i = 0; i < BUFFER_COUNT; i++)
freeBuffers.enqueue(&exampleBuffers[i]);
while (true) {
// Fetch packet from shared variable
ExampleBuffer* packet = dequeueOrNull();
if (packet) {
// ... read and do something with the data here ...
// Once we're done with the buffer, we need to release it back to the producer
freeBuffers.enqueue(packet);
}
size_t overflowBytes = getAndResetOverflowCount();
if (overflowBytes) {
// ...
}
}
}
The key changes:
If the interrupt runs out of free buffers, it will recover
If the interrupt receives data while it doesn't have a receive buffer, it will communicate that to the main thread via getAndResetOverflowCount
If you keep getting buffer overflows, you can simply increase the buffer count
I've encapsulated the multithreaded access into a queue class implemented as a linked list (BufferList), which supports atomic dequeue and enqueue. The previous example also used queues, but of length 0-1 (either an item is enqueued or it isn't), and so the implementation of the queue was just a single variable. In the case of running out of free buffers, the receive queue could have 2 items, so I upgraded it to a proper queue rather than adding more shared variables.
If the interrupt is the producer and mainline code is the consumer, surely it's as simple as disabling the interrupt for the duration of the consume operation?
That's how I used to do it in my embedded micro controller days.