Thread-safe log buffer in C++? - c++

I'm implementing my own logging system for performance purposes (and because i basically just need a buffer). What i currently have is something like this:
// category => messages
static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log["192.168.0.1"] << "This is a dynamic entry";
dump_logs();
}
}
void dump_logs() {
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// save the ostringstream to a file
// clear the log
log["info"].str("")
}
}
It works perfectly. However, i've just added threads and i'm not sure if this code is thread-safe. Any tips?

You can make this thread safe by declaring your map thread_local. If you are going to use it across translation units then make it extern and define it in one translation unit, otherwise static is fine.
You will still need to synchronize writing the logs to disk. A mutex should fix that:
// category => messages (one per thread)
thread_local static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log["192.168.0.1"] << "This is a dynamic entry";
dump_logs();
}
}
void dump_logs() {
static std::mutex mtx; // mutex shared between threads
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// now I need to care about threads
// use { to create a lock that will release at the end }
{
std::lock_guard<std::mutex> lock(mtx); // synchronized access
// save the ostringstream to a file
}
// clear the log
log["info"].str("");
}
}

On a POSIX system, if you're always writing data to the end of the file, the fastest way for multiple threads to write data to a file is to use low-level C-style open() in append mode, and just call write(), because the POSIX standard for write() states:
On a regular file or other file capable of seeking, the actual writing
of data shall proceed from the position in the file indicated by the
file offset associated with fildes. Before successful return from
write(), the file offset shall be incremented by the number of bytes
actually written. On a regular file, if the position of the last byte
written is greater than or equal to the length of the file, the length
of the file shall be set to this position plus one.
...
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
So, all write() calls from within a process to a file opened in append mode are atomic.
No mutexes are needed.
Almost. The only issue you have to be concerned with is
If write() is interrupted by a signal after it successfully writes
some data, it shall return the number of bytes written.
If you have enough control over your environment that you can be sure that your calls to write() will not be interrupted by a signal after only a portion of the data is written, this is the fastest way to write data to a file from multiple threads - you're using the OS-provided lock on the file descriptor that ensures adherence to the POSIX-specified behavior, and as long as you generate the data to be written without any locking, that file descriptor lock is the only one in the entire data path. And that lock will be in your data path no matter what you do in your code.

Related

Making processes wait until resource is loaded into shared memory using Boost.Interprocess

Suppose I have n processes with IDs 1 to n. I have a file with lots of data, where each process will only store a disjoint subset of the data. I would like to load and process the file using exactly one process, store the resulting data in a data structure allocated via Boost.Interprocess in shared memory, and then allow any (including the one who loaded the file) process to read from the data.
For this to work, I need to make use of some of the Boost.Interprocess synchronization constructs located here to ensure processes do not try to read the data before it has been loaded. However, I am struggling with this part and it is likely due to my lack of experience in this area. At the moment, I have process(1) loading the file into shared memory and I need a way to ensure any given process cannot read the file contents until the load is complete, even if the read might happen arbitrarily long after the loading occurs.
I wanted to try and use a combination of a mutex and condition variable using the notify_all call so that process(1) can signal to the other processes it is okay to read from the shared memory data, but this seems to have an issue in that process(1) might send a notify_all call before some process(i) has even tried to wait for the condition variable to signal it is okay to read the data.
Any ideas for how to approach this in a reliable manner?
Edit 1
Here is my attempt to clarify my dilemma and express more clearly what I have tried. I have some class that I allocate into a shared memory space using Boost.Interprocess that has a form similar to the below:
namespace bi = boost::interprocess;
class cache {
public:
cache() = default;
~cache() = default;
void set_process_id(std::size_t ID) { id = ID; }
void load_file(const std::string& filename) {
// designated process to load
// file has ID equal to 0
if( id == 0 ){
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
// do work to process the file and
// place result in the data variable
// after processing file, notify all other
// processes that they can access the data
load_cond.notify_all();
}
}
void read_into(std::array<double, 100>& data_out) {
{ // wait to read data until load is complete
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
load_cond.wait(lock);
}
data_out = data;
}
private:
size_t id;
std::array<double, 100> data;
bi::interprocess_mutex m;
bi::interprocess_condition load_cond;
};
The above is roughly what I had when I asked the question but did not sit well with me because if the read_into method was called after the designated process executes the notify_all call, then the read_into would be stuck. What I just did this morning that seems to fix this dilemma is change this class to the following:
namespace bi = boost::interprocess;
class cache {
public:
cache():load_is_complete(false){}
~cache() = default;
void set_process_id(std::size_t ID) { id = ID; }
void load_file(const std::string& filename) {
// designated process to load
// file has ID equal to 0
if( id == 0 ){
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
// do work to process the file and
// place result in the data variable
// after processing file, notify all other
// processes that they can access the data
load_is_complete = true;
load_cond.notify_all();
}
}
void read_into(std::array<double, 100>& data_out) {
{ // wait to read data until load is complete
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
if( not load_is_complete ){
load_cond.wait(lock);
}
}
data_out = data;
}
private:
size_t id;
std::array<double, 100> data;
bool load_is_complete;
bi::interprocess_mutex m;
bi::interprocess_condition load_cond;
};
Not sure if the above is the most elegant, but I believe it should ensure processes cannot access the data being stored in shared memory until it has completed loading, whether they get to the mutex m before the designated process or after the designated process has loaded the file contents. If there is a more elegant way, I would like to know.
The typical way is to use named interprocess mutexex. See e.g. the example(s) in Boris Schälings "Boost" Book, which is freely available, also online: https://theboostcpplibraries.com/boost.interprocess-synchronization
If your segment creation is already suitably synchronized, you can use "unnamed" interprocess mutices inside your shared segment, which is usually more efficient and avoids polluting system namespaces with extraneous synchronization primitives.

Reading a variable from reader thread without holding up a writer thread

I've got two threads, a reader thread and a writer thread. The writer thread writes a string and the reader thread reads the string. The writer thread is extremely high speed and I do not want to hold the writer thread up. The reader thread is much slower (a factor million or more slower) and it is not important if the read string is a couple of cycles behind. The only important thing for the reader thread is that when it reads the string that it's not in an undefined state.
Is there a way to be thread safe for reading the string without holding up the writing thread?
I've also looked at making the variable atomic, but I read that this might be a performance bottleneck as well for the writing thread.
I'm not sure if it works but I come up with an idea:
Assume you have two string buffers, Buffer_0 & Buffer_1, each can hold a single string of multiple characters of a predefined max length.
The writer thread alternates between two buffers, but it first checks a mutex. The writer doesn't block on the mutex, it just writes to the other buffer if the mutex is not available. This means that it stops alternating between two buffers and writes into the same buffer multiple times while the reader slowly reads the mutex protected buffer.
Buffer choice of the reader probably doesn't matter much. It can always try to read Buffer_0. It may simply block on the mutex and wait until the writer starts writing Buffer_1. While it reads from the Buffer_0, the writer always writes to Buffer_1 over and over as it fails to acquire the mutex.
Of course, checking the availability, acquiring and releasing of the mutex introduces some run-time cost. Maybe, using an atomic variable which indicates the buffer index that the writer is currently writing into, may work faster than using a mutex. But I'm not sure if it works.
Update: I realized that in the above scenario, Buffer_1 is mostly useless as the reader only reads from Buffer_0. If it's not acceptable for reader to block, it can alternate too and read Buffer_1 instead of waiting. Or the writer can just skip the whole writing operation (writing to Buffer_1) if it's unable to acquire the mutex.
Are you OK with the reader reading a recent value and not necessarily the most recent one ? If so, you can use atomics :
#include <thread>
#include <atomic>
#include <string>
#include <iostream>
std::string spots[4];
std::atomic<int> canWrite;
std::atomic<int> readyIndex;
int writer()
{
while(true) // for demonstration, will be your real writer loop
{
if (readyIndex != canWrite)
{
spots[canWrite] = "foo"; // write here what the writers wants to write
readyIndex = canWrite + 0; // marks that spot as ready
}
}
}
int reader()
{
canWrite = 0;
while(true) // for demonstration, will be your real reader loop
{
if (readyIndex == canWrite)
{
std::cout << spots[readyIndex] << std::endl;
canWrite = (canWrite + 1) % 4; // allow the write to start writing at the next location
}
}
}
int main()
{
std::thread t1(writer);
std::thread r1(reader);
t1.join();
r1.join();
return 0;
}
The reader only writes to canWrite, telling the writer where it can write. The writer only writes to readyIndex, telling the reader where it can read.
If the reader has not read yet the latest string, the writer just skips and goes its merry way.

Wrap Windows 8/WP8 StorageFile for synchronous C++ access

I need a C++ wrapper class which can read/write/seek data synchronously from a Windows 8/WP8 Storage file (http://msdn.microsoft.com/library/windows/apps/br227171):
class FileWrapper
{
public:
FileWrapper(StorageFile^ file); // IRandomAccessStream or IInputStream
// are fine as input arguments too
byte* readBytes(int bytesToRead, int &bytesGot);
bool writeBytes(byte* data, int size);
bool seek(int position);
}
The data should be read from the file on-the-fly. It should not be cached in memory and the storage file should not be copied into the app's directory where it would be accssible with standard fopen and ifstream functions.
I tried to figure out how to do this (including the Microsoft file access sample code: http://code.msdn.microsoft.com/windowsapps/File-access-sample-d723e597) but I am stuck with the asynchronous access of each operation. Has someone hints how to achieve this? Or is there even built in functionality?
Regards,
Typically you wrap the async operations with the create_task() method and you can wait for the task's execution to complete by calling task.get() to achieve synchronicity, but IIRC this doesn't work for file access because the operations might try to return on the same thread they were executed on and if you're blocking that thread - you end up with a deadlock. I don't have time to try this, but maybe if you start on another thread - you could wait on your thread for completion like this, though it might still deadlock:
auto createTaskTask = create_task([]()
{
return create_task(FileIO::ReadTextAsync(file));
}
auto readFileTask = createTaskTask.get();
try
{
String^ fileContent = readFileTask.get();
}
catch(Exception^ ex)
{
...
}

strange proplem using two Threads and Boolean

(I hate having to put a title like this. but I just couldn't find anything better)
I have two classes with two threads. first one detects motion between two frames:
void Detector::run(){
isActive = true;
// will run forever
while (isActive){
//code to detect motion for every frame
//.........................................
if(isThereMotion)
{
if(number_of_sequence>0){
theRecorder.setRecording(true);
theRecorder.setup();
// cout << " motion was detected" << endl;
}
number_of_sequence++;
}
else
{
number_of_sequence = 0;
theRecorder.setRecording(false);
// cout << " there was no motion" << endl;
cvWaitKey (DELAY);
}
}
}
second one will record a video when started:
void Recorder::setup(){
if (!hasStarted){
this->start();
}
}
void Recorder::run(){
theVideoWriter.open(filename, CV_FOURCC('X','V','I','D'), 20, Size(1980,1080), true);
if (recording){
while(recording){
//++++++++++++++++++++++++++++++++++++++++++++++++
cout << recording << endl;
hasStarted=true;
webcamRecorder.read(matRecorder); // read a new frame from video
theVideoWriter.write(matRecorder); //writer the frame into the file
}
}
else{
hasStarted=false;
cout << "no recording??" << endl;
changeFilemamePlusOne();
}
hasStarted=false;
cout << "finished recording" << endl;
theVideoWriter.release();
}
The boolean recording gets changed by the function:
void Recorder::setRecording(bool x){
recording = x;
}
The goal is to start the recording once motion was detected while preventing the program from starting the recording twice.
The really strange problem, which honestly doesn't make any sense in my head, is that the code will only work if I cout the boolean recording ( marked with the "++++++"). Else recording never changes to false and the code in the else statment never gets called.
Does anyone have an idea on why this is happening. I'm still just begining with c++ but this problem seems really strange to me..
I suppose your variables isThereMotion and recording are simple class members of type bool.
Concurrent access to these members isn't thread safe by default, and you'll face race conditions, and all kinds of weird behaviors.
I'd recommend to declare these member variables like this (as long you can make use of the latest standard):
class Detector {
// ...
std::atomic<bool> isThereMotion;
};
class Recorder {
// ...
std::atomic<bool> hasStarted;
};
etc.
The reason behind the scenes is, that even reading/writing a simple boolean value splits up into several assembler instructions applied to the CPU, and those may be scheduled off in the middle for a thread execution path change of the process. Using std::atomic<> provides something like a critical section for read/write operations on this variable automatically.
In short: Make everything, that is purposed to be accessed concurrently from different threads, an atomic value, or use an appropriate synchronization mechanism like a std::mutex.
If you can't use the latest c++ standard, you can perhaps workaround using boost::thread to keep your code portable.
NOTE:
As from your comments, your question seems to be specific for the Qt framework, there's a number of mechanisms you can use for synchronization as e.g. the mentioned QMutex.
Why volatile doesn't help in multithreaded environments?
volatile prevents the compiler to optimize away actual read access just by assumptions of values set formerly in a sequential manner. It doesn't prevent threads to be interrupted in actually retrieving or writing values there.
volatile should be used for reading from addresses that can be changed independently of the sequential or threading execution model (e.g. bus addressed peripheral HW registers, where the HW changes values actively, e.g. a FPGA reporting current data throughput at a register inteface).
See more details about this misconception here:
Why is volatile not considered useful in multithreaded C or C++ programming?
You could use a pool of nodes with pointers to frame buffers as part of a linked list fifo messaging system using mutex and semaphore to coordinate the threads. A message for each frame to be recorded would be sent to the recording thread (appended to it's list and a semaphore released), otherwise the node would be returned (appended) back to the main thread's list.
Example code using Windows based synchronization to copy a file. The main thread reads into buffers, the spawned thread writes from buffers it receives. The setup code is lengthy, but the actual messaging functions and the two thread functions are simple and small.
mtcopy.zip
Could be a liveness issue. The compiler could be re-ordering instructions or hoisting isActive out of the loop. Try marking it as volatile.
From MSDN docs:
Objects that are declared as volatile are not used in certain optimizations because their values can change at any time. The system always reads the current value of a volatile object when it is requested, even if a previous instruction asked for a value from the same object. Also, the value of the object is written immediately on assignment.
Simple example:
#include <iostream>
using namespace std;
int main() {
bool active = true;
while(active) {
cout << "Still active" << endl;
}
}
Assemble it:
g++ -S test.cpp -o test1.a
Add volatile to active as in volatile bool active = true
Assemble it again g++ -S test.cpp -o test2.a and look at the difference diff test1.a test2.a
< testb $1, -5(%rbp)
---
> movb -5(%rbp), %al
> testb $1, %al
Notice the first one doesn't even bother to read the value of active before testing it since the loop body never modifies it. The second version does.

Implementing File class for both read and write operations on the file

I need to implement a class which holds a regular text file that will be valid for both read and write operations from multiple threads (say, "reader" threads and "writers").
I am working on visual studio 2010 and can use only the available libraries that it (VS 2010) has, so I chose to use the std::fstream class for the file operations and the CreateThread function & CRITICAL_SECTION object from the header.
I might start by saying that I seek, at the beginning, for a simple solution - just so it works....:)
My idea is as follows:
I created a File class that will hold the file and a "mutex" (CRITICAL_SECTION object) as private members.
In addition, this class (File class) provides a "public interface" to the "reader/writer" threads in order to perform a synchronized access to the file for both read and write operations.
See the header file of File class:
class File {
private:
std::fstream iofile;
int size;
CRITICAL_SECTION critical;
public:
File(std::string fileName = " ");
~File();
int getSize();
// the public interface:
void read();
void write(std::string str);
};
Also see the source file:
#include "File.h"
File :: File(std::string fileName)
{
// create & open file for read write and append
// and write the first line of the file
iofile.open(fileName, std::fstream::in | std::fstream::out | std::fstream::app); // **1)**
if(!iofile.is_open()) {
std::cout << "fileName: " << fileName << " failed to open! " << std::endl;
}
// initialize class member variables
this->size = 0;
InitializeCriticalSection(&critical);
}
File :: ~File()
{
DeleteCriticalSection(&critical);
iofile.close(); // **2)**
}
void File :: read()
{
// lock "mutex" and move the file pointer to beginning of file
EnterCriticalSection(&critical);
iofile.seekg(0, std::ios::beg);
// read it line by line
while (iofile)
{
std::string str;
getline(iofile, str);
std::cout << str << std::endl;
}
// unlock mutex
LeaveCriticalSection(&critical);
// move the file pointer back to the beginning of file
iofile.seekg(0, std::ios::beg); // **3)**
}
void File :: write(std::string str)
{
// lock "mutex"
EnterCriticalSection(&critical);
// move the file pointer to the end of file
// and write the string str into the end of the file
iofile.seekg(0, std::ios::end); // **4)**
iofile << str;
// unlock mutex
LeaveCriticalSection(&critical);
}
So my questions are (see the numbers regarding the questions within the code):
1) Do I need to specify anything else for the read and write operations I wish to perform ?
2) Anything else I need to add in the destrutor?
3) What do I need to add here in order that EVERY read operation will occur necessarily from the beginning of the file ?
4) What do I need to modify/add here in order that each write will take place at the end of the file (meaning I wish to append the str string into the end of the file)?
5) Any further comments will be great: another way to implement , pros & cons regarding my implementation, points to watch out , etc'.....
Thanks allot in advance,
Guy.
You must handle exceptions (and errors in general).
No, you destructor even has superfluous things like closing the underlying fstream, which the object takes care of itself in its destructor.
If you always want to start reading at the beginning of the file, just open it for reading and you automatically are at the beginning. Otherwise, you could seek to the beginning and start reading from there.
You already opened the file with ios::app, which causes every write operation to append to the end (including that it ignores seek operations that set the write position, IIRC).
There is a bunch that isn't going to work like you want it to...
Most importantly, you need to define what you need the class to behave like, i.e. what the public interface is. This includes guarantees about the content of the file on disk. For example, after creating an object without passing a filename, what should it write to? Should that really be a file who's name is a single space? Further, what if a thread wants to write two buffers that each contain 100 chars? The only chance to not get interrupted is to first create a buffer combining the data, otherwise it could get interrupted by a different thread. It gets even more complicate concerning the guarantees that your class should fulfill while reading.
Why are you not using references when passing strings? Your tutorial should mention them.
You are invoking the code to enter and leave the critical section at the beginning and end of a function scope. This operation should be bound to the ctor and dtor of a class, check out the RAII idiom in C++.
When you are using a mutex, you should document what it is supposed to protect. In this case, I guess it's the iofile, right? You are accessing it outside the mutex-protected boundaries though...
What is getSize() supposed to do? What would a negative size indicate? In case you want to signal errors with that, that's what exceptions are for! Also, after opening an existing, possibly non-empty file, the size is zero, which sounds weird to me.
read() doesn't return any data, what is it supposed to do?
Using a while-loop to read something always has to have the form "while try-to-read { use data}", yours has the form "while success { try-to-read; use data; }", i.e. it will use data after failing to read it.
Streams have a state, and that state is sticky. Once the failbit is set, it remains set until you explicitly call clear().
BTW: This looks like logging code or a file-backed message queue. Both can be created in a thread-friendly way, but in order to make suggestions, you would have to tell us what you are actually trying to do. This is also what you should put into a comment section on top of your class, so that any reader can understand the intention (and, more important now, so that YOU make up you mind what it's supposed to be).