Asynchronous thread-safe logging in C++ (no mutex) - c++

I'm actually looking for a way to do an asynchronous and thread-safe logging in my C++.
I have already explored thread-safe logging solutions like log4cpp, log4cxx, Boost:log or rlog, but it seems that all of them use a mutex. And as far as I know, mutex is a synchronous solution, which means that all threads are locked as they try to write their messages while other does.
Do you know a solution?

I think your statement is wrong: using mutex is not necessary equivalent to a synchronous solution. Yes, Mutex is for synchronization control but it can be used for many different thing. We can use mutex in, for example, a producer consumer queue while the logging is still happening asynchronously.
Honestly I haven't looked into the implementation of these logging library but it should be feasible to make a asynchronous appender (for log4j like lib) which logger writes to an producer consumer queue and another worker thread is responsible to write to a file (or even delegate to another appender), in case it is not provided.
Edit:
Just have had a brief scan in log4cxx, it does provide an AsyncAppender which does what I suggested: buffers the incoming logging event, and delegate to attached appender asynchronously.

I'd recomment avoiding the problem by using only one thread for logging. For passing the necessary data to log, you can use lock-free fifo queue (thread safe as long as producer and consumer are strictly separated and only one thread has each role -- therefore you will need one queue for each producer.)
Example of fast lock-free queue is included:
queue.h:
#ifndef QUEUE_H
#define QUEUE_H
template<typename T> class Queue
{
public:
virtual void Enqueue(const T &element) = 0;
virtual T Dequeue() = 0;
virtual bool Empty() = 0;
};
hybridqueue.h:
#ifndef HYBRIDQUEUE_H
#define HYBRIDQUEUE_H
#include "queue.h"
template <typename T, int size> class HybridQueue : public Queue<T>
{
public:
virtual bool Empty();
virtual T Dequeue();
virtual void Enqueue(const T& element);
HybridQueue();
virtual ~HybridQueue();
private:
struct ItemList
{
int start;
T list[size];
int end;
ItemList volatile * volatile next;
};
ItemList volatile * volatile start;
char filler[256];
ItemList volatile * volatile end;
};
/**
* Implementation
*
*/
#include <stdio.h>
template <typename T, int size> bool HybridQueue<T, size>::Empty()
{
return (this->start == this->end) && (this->start->start == this->start->end);
}
template <typename T, int size> T HybridQueue<T, size>::Dequeue()
{
if(this->Empty())
{
return NULL;
}
if(this->start->start >= size)
{
ItemList volatile * volatile old;
old = this->start;
this->start = this->start->next;
delete old;
}
T tmp;
tmp = this->start->list[this->start->start];
this->start->start++;
return tmp;
}
template <typename T, int size> void HybridQueue<T, size>::Enqueue(const T& element)
{
if(this->end->end >= size) {
this->end->next = new ItemList();
this->end->next->start = 0;
this->end->next->list[0] = element;
this->end->next->end = 1;
this->end = this->end->next;
}
else
{
this->end->list[this->end->end] = element;
this->end->end++;
}
}
template <typename T, int size> HybridQueue<T, size>::HybridQueue()
{
this->start = this->end = new ItemList();
this->start->start = this->start->end = 0;
}
template <typename T, int size> HybridQueue<T, size>::~HybridQueue()
{
}
#endif // HYBRIDQUEUE_H

If I get your question right you are concerned about doing I/O operation (probably write to a file) in a logger's critical section.
Boost:log lets you define a custom writer object. You can define operator() to call async I/O or pass a message to your logging thread (which is doing I/Os).
http://www.torjo.com/log2/doc/html/workflow.html#workflow_2b

No libraries will do this as far as I know - it's too complex. You'll have to roll your own, and here's an idea which I just had, create a per thread log file, ensure that the first item in each entry is a timestamp, and then merge the logs after then run and sort (by timestamp) to get a final log file.
You can use some thread local storage may be (say a FILE handle AFAIK it won't be possible to store a stream object in thread local storage) and look this handle up on each log line and write to that specific file.
All this complexity vs locking the mutex? I don't know the performance requirements of your application, but if it is sensitive - why would you be logging (excessively)? Think of other ways to obtain the information you require without logging?
Also one other thing to consider is to use the mutex for the least amount of time possible, i.e. construct your log entry first and then just before writing to the file, acquire the lock.

In a Windows program, we use a user-defined Windows message. First, memory is allocated for the log entry on the heap. Then PostMessage is called, with the pointer as the LPARAM, and the record size as the WPARAM. The receiver window extracts the record, displays it, and saves it in the log file. Then PostMessage returns, and the allocated memory is deallocated by the sender. This approach is thread-safe, and you don't have to use mutexes. Concurrency is handled by the message queue mechanism of Windows. Not very elegant, but works.

Lock-free algorithms are not necessarily the fastest ones. Define your boundaries. How many threads are there for logging? How much will be written in a single log operation at most?
I/O bound operations are much much slower than thread context switching due to blocking/awaking threads. Using lock-free/spinning lock algorithm with 10 writing threads will bring a heavy load to CPU.
Shortly, block other threads when you are writing to a file.

Related

Lock-free multiple producer multiple consumer in C++

I have to program a multiple producer-consumer system in C++, but I'm lost trying to put together each part of the model (threads with its correct buffer). The basic functioning of the model is: I have an initial thread that executes a function. This returned results need to be put in an undetermined number of buffers, because each elements that the function proccess is different and it needs to be treated in a single thread. Then, with the data stored in the buffers, another n threads need to get the data of this buffers to do another function, and the return of this need to be put in some buffers again.
At the moment I have got this buffer structure created:
template <typename T>
class buffer {
public:
atomic_buffer(int n);
int bufSize() const noexcept;
bool bufEmpty() const noexcept;
bool full() const noexcept;
~atomic_buffer() = default;
void put(const T & x, bool last) noexcept;
std::pair<bool,T> get() noexcept;
private:
int next_pos(int p) const noexcept;
private:
struct item {
bool last;
T value;
};
const int size_;
std::unique_ptr<item[]> buf_;
alignas(64) std::atomic<int> nextRd_ {0};
alignas(64) std::atomic<int> nextWrt_ {0};
};
I've also created a vectorstructure which stores a collection un buffers, in order to satisfy the undetermined number of threads necessity.
std::vector<std::unique_ptr<locked_buffer<std::pair<int, std::vector<std::vector<unsigned char>>>>>> v1;
for(int i=0; i<n; i++){
v1.push_back(std::unique_ptr<locked_buffer<std::pair<int,std::vector<std::vector<unsigned char>>>>> (new locked_buffer<std::pair<int, std::vector<std::vector<unsigned char>>>>(aux)));
}
Edit:
Without knowing more context, this looks like an application for a standard thread pool. You have different tasks that are enqueued to a synchronized queue (like the buffer class you have there). Each worker thread of the thread pool polls this queue and processes one task each time (by executing a run() method for example). They write the results back into another synchronized queue.
Each worker thread has an own thread-local pair of input and output buffers. They don't need synchronization because they are only accessed from within the owner thread itself.
Edit: Actually, I think this can be simplified a lot: Just use a thread pool and one synchronized queue. The worker threads can enqueue new tasks directly into the queue. Each of your threads in the drawing would correspond to one type of task and implement a common Task interface.
You don't need mutiple buffers. You can use polymorphism and put everything in one buffer.
Edit 2 - Explanation of thread pools:
A thread pool is just a concept. Forget about the pooling aspect, use a fixed number of threads. The main idea is: Instead of having several threads with a specific function, have N threads that can process any kind of task. Where N is the number of cores of the CPU.
You can transform this
into
The worker thread does something like the following. Note that this is simplified, but you should get the idea.
void Thread::run(buffer<Task*>& queue) {
while(true) {
Task* task = queue.get();
if(task)
task->execute();
while(queue.isEmpty())
waitUntilQueueHasElement();
}
}
And your tasks implement a common interface so you can put Task* pointers into a single queue:
struct Task {
virtual void execute() = 0;
}
struct Task1 : public Task {
virtual void execute() override {
A();
B1();
C();
}
}
...
Also, do yourself a favour and use typedefs ;)
`std::vector<std::unique_ptr<locked_buffer<std::pair<int, std::vector<std::vector<unsigned char>>>>>> v1;`
becomes
typedef std::vector<std::vector<unsigned char>> vector2D_uchar;
typedef std::pair<int, vector2D_uchar> int_vec_pair;
typedef std::unique_ptr<locked_buffer<int_vec_pair>> locked_buffer_ptr;
std::vector<locked_buffer_ptr> v1;

Synchronizing method calls on shared object from multiple threads

I am thinking about how to implement a class that will contain private data that will be eventually be modified by multiple threads through method calls. For synchronization (using the Windows API), I am planning on using a CRITICAL_SECTION object since all the threads will spawn from the same process.
Given the following design, I have a few questions.
template <typename T> class Shareable
{
private:
const LPCRITICAL_SECTION sync; //Can be read and used by multiple threads
T *data;
public:
Shareable(LPCRITICAL_SECTION cs, unsigned elems) : sync{cs}, data{new T[elems]} { }
~Shareable() { delete[] data; }
void sharedModify(unsigned index, T &datum) //<-- Can this be validly called
//by multiple threads with synchronization being implicit?
{
EnterCriticalSection(sync);
/*
The critical section of code involving reads & writes to 'data'
*/
LeaveCriticalSection(sync);
}
};
// Somewhere else ...
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
Shareable<ActualType> *ptr = static_cast<Shareable<ActualType>*>(lpParameter);
T copyable = /* initialization */;
ptr->sharedModify(validIndex, copyable); //<-- OK, synchronized?
return 0;
}
The way I see it, the API calls will be conducted in the context of the current thread. That is, I assume this is the same as if I had acquired the critical section object from the pointer and called the API from within ThreadProc(). However, I am worried that if the object is created and placed in the main/initial thread, there will be something funky about the API calls.
When sharedModify() is called on the same object concurrently,
from multiple threads, will the synchronization be implicit, in the
way I described it above?
Should I instead get a pointer to the
critical section object and use that instead?
Is there some other
synchronization mechanism that is better suited to this scenario?
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
It's not implicit, it's explicit. There's only only CRITICAL_SECTION and only one thread can hold it at a time.
Should I instead get a pointer to the critical section object and use that instead?
No. There's no reason to use a pointer here.
Is there some other synchronization mechanism that is better suited to this scenario?
It's hard to say without seeing more code, but this is definitely the "default" solution. It's like a singly-linked list -- you learn it first, it always works, but it's not always the best choice.
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
Implicit from the caller's perspective, yes.
Should I instead get a pointer to the critical section object and use that instead?
No. In fact, I would suggest giving the Sharable object ownership of its own critical section instead of accepting one from the outside (and embrace RAII concepts to write safer code), eg:
template <typename T>
class Shareable
{
private:
CRITICAL_SECTION sync;
std::vector<T> data;
struct SyncLocker
{
CRITICAL_SECTION &sync;
SyncLocker(CRITICAL_SECTION &cs) : sync(cs) { EnterCriticalSection(&sync); }
~SyncLocker() { LeaveCriticalSection(&sync); }
}
public:
Shareable(unsigned elems) : data(elems)
{
InitializeCriticalSection(&sync);
}
Shareable(const Shareable&) = delete;
Shareable(Shareable&&) = delete;
~Shareable()
{
{
SyncLocker lock(sync);
data.clear();
}
DeleteCriticalSection(&sync);
}
void sharedModify(unsigned index, const T &datum)
{
SyncLocker lock(sync);
data[index] = datum;
}
Shareable& operator=(const Shareable&) = delete;
Shareable& operator=(Shareable&&) = delete;
};
Is there some other synchronization mechanism that is better suited to this scenario?
That depends. Will multiple threads be accessing the same index at the same time? If not, then there is not really a need for the critical section at all. One thread can safely access one index while another thread accesses a different index.
If multiple threads need to access the same index at the same time, a critical section might still not be the best choice. Locking the entire array might be a big bottleneck if you only need to lock portions of the array at a time. Things like the Interlocked API, or Slim Read/Write locks, might make more sense. It really depends on your thread designs and what you are actually trying to protect.

C++ thread safe bound queue returning object for original thread to delete - 1 writer - 1 reader

The goal is to have a writer thread and a reader thread but only the writer news and deletes the action object. There is only one reader and one writer.
something like:
template<typename T, std::size_t MAX>
class TSQ
{
public:
// blocks if there are MAX items in queue
// returns used Object to be deleted or 0 if none exist
T * push(T * added); // added will be processed by reader
// blocks if there are no objects in queue
// returns item pushed from writer for deletion
T * pop(T * used); // used will be freed by writer
private:
// stuff here
};
-or better if the delete and return can be encapsulated:
template<typename T, std::size_t MAX>
class TSQ
{
public:
// blocks if there are MAX items in queue
push(T * added); // added will be processed by reader
// blocks if there are no objects in queue
// returns item pushed from writer for deletion
T& pop();
private:
// stuff here
};
where the writer thread has a loop like:
my_object *action;
while (1) {
// create action
delete my_queue.push(action);
}
and the reader has a loop like:
my_object * action=0;
while(1) {
action=my_queue.pop(action);
// do stuff with action
}
The reason to have the writer delete the action item is for performance
Is there an optimal way to do this?
Bonus points if MAX=0 is specialized to be unbounded (not required, just tidy)
I'm not looking for the full code, just the data structure and general approach
This is an instance of the producer-consumer problem. A popular way to solve this is to use a lockfree queue.
Also, the first practical change you might want to make is to add a sleep(0) into the production/consumption loops, so you will give up your time slice every iteration and won't end up using 100% of a CPU core.
The most common solution to this problem is to pass values, not pointers.
You can pass shared_ptr to this queue. Your queue doesn't need to know how to free memory after you.
If you use something like Lamport's ring buffer for single producer - single consumer blocking queue, it's a natural solution to use std::vector under the hood that will call destructors for every element automatically.
template<typename T, std::size_t MAX>
class TSQ
{
public:
// blocks if there are MAX items in queue
void push(T added); // added will be processed by reader
// blocks if there are no objects in queue
T pop();
private:
std::vector<T> _content;
size_t _push_index;
size_t _pop_index;
...
};

Is there an std or boost container that avoids race condition between its insert and find methods?

Each thread may insert an object to the container (at most once) using 'insert' function.
Hereafter, the thread may try to access this object using 'get' function.
Therefore, there is no race between 'insert' and 'get' when used by the same thread.
However, a different thread may try to insert its own object while another thread has called 'get'
I need a container where this situation does not need any synchronization method.
The number of threads may vary dramatically between executions.
class Object;
class Container<Object>;
Container<Object> g_container;
void insert(int threadId)
{
ScopedLock<Mutex> lock(insertMutex);
Object obj;
g_container[threadId] = obj;
}
Object get(int threadId)
{
return g_container[threadId];
}
You can use a vector of container pointers. Each thread manages their own container instance, and registers it with the array of container pointers.
template <typename T>
struct Container {
Mutex insertMutex;
ContainerType<T> container;
int index;
void insert (T &obj) {
ScopedLock<Mutex> lock(insertMutex);
container.insert(obj);
}
T get () {
return *container.begin();
}
void register_container () {
if (index != 0) return;
if (counter == MAX_THREADS) throw 0;
index = ++counter;
ScopedLock<Mutex> lock(registerMutex);
containers.push_back(this);
}
static std::vector<Container *> containers;
static Atomic<int> counter;
static Mutex registerMutex;
};
template <typename T>
std::vector<Container<T> *> Container<T>::containers;
template <typename T>
Atomic<int> Container<T>::counter;
template <typename T>
Mutex Container<T>::registerMutex;
Now, you can iterate over the Container<T>::containers to access all containers.
The thing you're looking for is usually called a "lock-free data structure".
I understand why you think you need a lock-free container, but I can almost promise you that you don't. They are usually more trouble than they're worth. You should just have one mutex which governs all access to an ordinary container. Lock the mutex to insert in the container. Lock the mutex to remove from the container. And lock the mutex before reading (or iterating over) the container.
Here is some helpful discussion from Herb Sutter, chair of the C++ standards committee:
http://www.drdobbs.com/cpp/lock-free-code-a-false-sense-of-security/210600279
+1 to James Brock.
But it sounds more like you want one container for each thread, yet mutexed, and locking/unlocking the mutexes is cheap because 99.9% of the time the operation succeeds the first time. But the "master" thread does have the ability to occasionally pause the other threads while examining them.
If each thread has only one object, the containers go away. Just lock the objects. Don't worry about the cost of locking unless it's a proven performance problem. In that case, the problem isn't to remove the locks, but merely to reduce their usage frequency. So keep the objects locked and only do an unlock every hundredth time or so. Combine with a condition variable if necessary.

Problem with thread-safe queue?

I'm trying to write a thread-safe queue using pthreads in c++. My program works 93% of the time. The other 7% of the time it other spits out garbage, OR seems to fall asleep. I'm wondering if there is some flaw in my queue where a context-switch would break it?
// thread-safe queue
// inspired by http://msmvps.com/blogs/vandooren/archive/2007/01/05/creating-a-thread-safe-producer-consumer-queue-in-c-without-using-locks.aspx
// only works with one producer and one consumer
#include <pthread.h>
#include <exception>
template<class T>
class tsqueue
{
private:
volatile int m_ReadIndex, m_WriteIndex;
volatile T *m_Data;
volatile bool m_Done;
const int m_Size;
pthread_mutex_t m_ReadMutex, m_WriteMutex;
pthread_cond_t m_ReadCond, m_WriteCond;
public:
tsqueue(const int &size);
~tsqueue();
void push(const T &elem);
T pop();
void terminate();
bool isDone() const;
};
template <class T>
tsqueue<T>::tsqueue(const int &size) : m_ReadIndex(0), m_WriteIndex(0), m_Size(size), m_Done(false) {
m_Data = new T[size];
pthread_mutex_init(&m_ReadMutex, NULL);
pthread_mutex_init(&m_WriteMutex, NULL);
pthread_cond_init(&m_WriteCond, NULL);
pthread_cond_init(&m_WriteCond, NULL);
}
template <class T>
tsqueue<T>::~tsqueue() {
delete[] m_Data;
pthread_mutex_destroy(&m_ReadMutex);
pthread_mutex_destroy(&m_WriteMutex);
pthread_cond_destroy(&m_ReadCond);
pthread_cond_destroy(&m_WriteCond);
}
template <class T>
void tsqueue<T>::push(const T &elem) {
int next = (m_WriteIndex + 1) % m_Size;
if(next == m_ReadIndex) {
pthread_mutex_lock(&m_WriteMutex);
pthread_cond_wait(&m_WriteCond, &m_WriteMutex);
pthread_mutex_unlock(&m_WriteMutex);
}
m_Data[m_WriteIndex] = elem;
m_WriteIndex = next;
pthread_cond_signal(&m_ReadCond);
}
template <class T>
T tsqueue<T>::pop() {
if(m_ReadIndex == m_WriteIndex) {
pthread_mutex_lock(&m_ReadMutex);
pthread_cond_wait(&m_ReadCond, &m_ReadMutex);
pthread_mutex_unlock(&m_ReadMutex);
if(m_Done && m_ReadIndex == m_WriteIndex) throw "queue empty and terminated";
}
int next = (m_ReadIndex +1) % m_Size;
T elem = m_Data[m_ReadIndex];
m_ReadIndex = next;
pthread_cond_signal(&m_WriteCond);
return elem;
}
template <class T>
void tsqueue<T>::terminate() {
m_Done = true;
pthread_cond_signal(&m_ReadCond);
}
template <class T>
bool tsqueue<T>::isDone() const {
return (m_Done && m_ReadIndex == m_WriteIndex);
}
This could be used like this:
// thread 1
while(cin.get(c)) {
queue1.push(c);
}
queue1.terminate();
// thread 2
while(!queue1.isDone()) {
try{ c = queue1.pop(); }
catch(char const* str){break;}
cout.put(c);
}
If anyone sees a problem with this, please say so :)
Yes, there are definitely problems here. All your accesses to queue member variables occur outside the mutexes. In fact, I'm not entirely sure what your mutexes are protecting, since they are just around a wait on a condition variable.
Also, it appears that your reader and writer will always operate in lock-step, never allowing the queue to grow beyond one element in size.
If this is your actual code, one problem right off the bat is that you're initializing m_WriteCond twice, and not initializing m_ReadCond at all.
You should treat this class as a monitor. You should have a "monitor lock" for each queue (a normal mutex). Whenever you enter a method that reads or writes any field in the queue, you should lock this mutex as soon as you enter it. This prevents more than one thread from interacting with the queue at a time. You should release the lock before you wait on a condition and when you leave a method so other threads may enter. Make sure to re-acquire the lock when you are done waiting on a condition.
If you want anything with decent performance I would strongly suggest dumping your R/W lock and just use a very simple spinlock. Or if you really think you can get the performance you want with R/W lock, i would roll your own based on this design(single word R/W Spinlock) from Joe Duffy.
Seems that the problem is that you have a race condition that thread 2 CAN run before thread 1 ever does any cin.get(c). Need to make sure that the data is initialized and when you're getting information that you are ensuring that you are doing something if the data has not been entered.
Maybe this is me not seeing the rest of the code where this is done though.