multi producer, single consumer queue with char[] - c++

My project's code base current contains a multi-producer, single consumer data writer. It uses 2 circular buffers and a couple of CRITICALSECTION locks to write variably sized streams of data passed in as char* with a byte count. It takes data from dozens+ of threads writing char* and writes them to a binary file.
I'm looking to improve this design, but it seems that all the implementations that are written on the net involve simple primitive types as opposed to writing char arrays. Can anyone suggest links for me to research, as my google-fu is pretty weak?
My test implementation is essentially:
Record(char* data, uint32_t byte_count)
c_section.Lock(); //CCriticalSection, std::mutex, whatever
recording_buffer.write(data, byte_count); //in house circular buff
if(recording_buffer.size() > 0)
//read the entire size of the filled bytes of the buffer
//multiple entries are written to the file at once.
write_to_file(, recording_buffer.size();


What is the best way to achieve buffered read from pipe in C++?

I am monitoring the read end of a pipe in a select loop and reading from it with read when data is available. I am always expecting a certain data format through the pipe which (for the sake of the example) is of type struct packet_t. My initial code performed the following operations on the file descriptor once the select call signaled it:
read_packet(int fd)
struct packet_t pkt;
int read_bytes = read(fd, &pkt, sizeof(pkt));
if (bytes_read <= 0)
/* handle some sort of error */
return -E_SOCK;
else if (bytes_read < sizeof(pkt))
/* return bad message format */
return -E_BADMSG;
/* do something with the packet */
return E_OK;
Obvisouly, the above fails to do what I intended because it may be the case that a read call will return less bytes than sizeof(pkt) just because of some delays and later, the rest of the bytes will be delivered as well, completing the packet. However, my code assumes that if a complete packet is not received in the first call to read than it must be a transmission error and all subsequent packets will be discarded as bad.
To fix this, I need to buffer the read, i.e. to read into a separate buffer and only interpret the buffer contents as a packet if the number of received bytes is at least sizeof(pkt).
My quesions is: is there a c++ class or technique to achieve buffered read nicely? For example, I can construct a class mybuffer which holds the bytes already read and when it hits a threshold it calls a callback allowing me to read the packet. However, I don't want to reinvent the wheel if there is any wheel out there and I am asking whether there is a solution either in the standard library (for example using iostream or in other c++ libraries like boost.
Thank you for your support!

C++ weird async behaviour

Note that I'm using boost async, due to the lack of threading classes support in MinGW.
So, I wanted to send a packet every 5 seconds and decided to use boost::async (std::async) for this purpose.
This is the function I use to send the packet (this is actually copying to the buffer and sending in the main application loop - nvm - it's working fine outside async method!)
m_sendBuf = new char[1024]; // allocate buffer
bool CNetwork::Send(const void* sourceBuffer, size_t size) {
size_t bufDif = m_sendBufSize - m_sendInBufPos;
if (size > bufDif) {
return false;
memcpy(m_sendBuf + m_sendInBufPos, sourceBuffer, size);
m_sendInBufPos += size;
return true;
Packet sending code:
struct TestPacket {
unsigned char type;
int code;
void SendPacket() {
TestPacket myPacket{};
myPacket.type = 10;
myPacket.code = 1234;
Send(&TestPacket, sizeof(myPacket));
Async code:
void StartPacketSending() {
StartPacketSending(); // Recursive endless call
boost::async(boost::launch::async, &StartPacketSending);
Alright. So the thing is, when I call SendPacket() from the async method, received packet is malformed on the server side and the data is different than specified. This doesn't happend when called outside the async call.
What is going on here? I'm out of ideas.
I think I have my head wrapped around what you are doing here. You are loading all unsent in to buffer in one thread and then flushing it in a different thread. Even thought the packets aren't overlapping (assuming they are consumed quickly enough), you still to synchronize all the shared data.
m_sendBuf, m_sendInPos, and m_sendBufSize are all being read from the main thread, likely while memcpy or your buffer size logic is running. I suspect you will have to use a proper queue to get your program to work as intended in the long run, but try protecting those variables with a mutex.
Also as other commenters have pointed out, infinite recursion is not supported in C++, but that probably does not contribute to your malformed packets.

OS level file I/O locks on Qt platforms

Following from this question I have decided to see whether I could implement proper asynchronous file I/O using (multiple) QFiles. The idea is to use a "pool" of QFile objects operating on a single file and dispatch requests via QtConcurrent API to be executed with dedicated QFile object each. After the task would finish the result would be emitted (in case of reads) and QFile object returned to the pool. My initial tests seem to indicate that this is a valid approach and in fact does allow concurrent read/write operations (e.g. read while writing) and also that it can further help with performance (read can finish in between a write).
The obvious issue is reading and writing the same segment of the file. To see what happens I used the above mentioned approach to set up the situation and just let it write and read frantically over the same part of the file. To spot the possible "corruption" I am increasing a number at the beginning of the segment and at the end of it in the writes. The idea being that if the read ever reads different numbers at the start or at the end it can in real situation read corrupted data because it did read partially written data in such a case.
The reads and writes were overlapping a lot so I knew they were happening asynchronously and yet not a single time the output was "wrong". It basically means that the read will never read partially written data. At least on Windows. Using QIODevice::Unbuffered flag did not change it.
I assume that some kind of locking is done on the OS level to prevent this (or caching possibly?), please correct me if this assumption is wrong. I base this on a fact that a read that started after write started could finish before a write finished. Since I plan to deploy the application on other platforms as well I was wondering whether I can count on this being the case for all platforms supported by Qt (mainly those based on POSIX and Android) or I need to actually implement a locking mechanism myself for these situations - to defer reading from a segment that is being written to.
There's nothing in the implementation of QFile that guarantees atomicity of writes. So the idea of using multiple QFile objects to access the same sections of the same underlying file won't ever work right. Your tests on Windows are not indicative of there not being a problem, they are merely insufficient: had they been sufficient, they'd have produced the problem you're expecting.
For highly performant file access in small, possibly overlapping chunks, you have to:
Map the file to memory.
Serialize access to the memory, perhaps using multiple mutexes to improve concurrency.
Access memory concurrently, and don't hold the mutex while the data is paged in.
This is done by first prefetching - either reading from every page in the range of bytes to be accessed, and discarding the results, or using a platform-specific API. Then you lock the mutex and copy the data either out of the file or into it.
The OS does the rest.
class FileAccess : public QObject {
QFile m_file;
QMutex m_mutex;
uchar * m_area = nullptr;
void prefetch(qint64 pos, qint64 size);
FileAccess(const QString & name) : m_file{name} {}
bool open() {
if ( {
m_area =, m_file.size());
if (! m_area) m_file.close();
return m_area != nullptr;
void readReq(qint64 pos, qint64 size);
Q_SIGNAL readInd(const QByteArray & data, qint64 pos);
void write(const QByteArray & data, qint64 pos);
void FileAccess:prefetch(qint64 pos, qint64 size) {
const qint64 pageSize = 4096;
const qint64 pageMask = ~pageSize;
for (qint64 offset = pos & pageMask; offset < size; offset += pageSize) {
volatile uchar * p = m_area+offset;
void FileAccess:readReq(qint64 pos, qint64 size) {
QByteArray result{size, Qt::Uninitialized};
prefetch(pos, size);
QMutexLocker lock{&m_mutex};
memcpy(, m_area+pos, result.size());
emit readInd(result, pos);
void FileAccess::write(const QByteArray & data, qint64 pos) {
prefetch(pos, data.size());
QMutexLocker lock{&m_mutex};
memcpy(m_area+pos, data.constData(), data.size());

Thread-safe log buffer in C++?

I'm implementing my own logging system for performance purposes (and because i basically just need a buffer). What i currently have is something like this:
// category => messages
static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log[""] << "This is a dynamic entry";
void dump_logs() {
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// save the ostringstream to a file
// clear the log
It works perfectly. However, i've just added threads and i'm not sure if this code is thread-safe. Any tips?
You can make this thread safe by declaring your map thread_local. If you are going to use it across translation units then make it extern and define it in one translation unit, otherwise static is fine.
You will still need to synchronize writing the logs to disk. A mutex should fix that:
// category => messages (one per thread)
thread_local static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log[""] << "This is a dynamic entry";
void dump_logs() {
static std::mutex mtx; // mutex shared between threads
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// now I need to care about threads
// use { to create a lock that will release at the end }
std::lock_guard<std::mutex> lock(mtx); // synchronized access
// save the ostringstream to a file
// clear the log
On a POSIX system, if you're always writing data to the end of the file, the fastest way for multiple threads to write data to a file is to use low-level C-style open() in append mode, and just call write(), because the POSIX standard for write() states:
On a regular file or other file capable of seeking, the actual writing
of data shall proceed from the position in the file indicated by the
file offset associated with fildes. Before successful return from
write(), the file offset shall be incremented by the number of bytes
actually written. On a regular file, if the position of the last byte
written is greater than or equal to the length of the file, the length
of the file shall be set to this position plus one.
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
So, all write() calls from within a process to a file opened in append mode are atomic.
No mutexes are needed.
Almost. The only issue you have to be concerned with is
If write() is interrupted by a signal after it successfully writes
some data, it shall return the number of bytes written.
If you have enough control over your environment that you can be sure that your calls to write() will not be interrupted by a signal after only a portion of the data is written, this is the fastest way to write data to a file from multiple threads - you're using the OS-provided lock on the file descriptor that ensures adherence to the POSIX-specified behavior, and as long as you generate the data to be written without any locking, that file descriptor lock is the only one in the entire data path. And that lock will be in your data path no matter what you do in your code.

Single Producer Multiple Consumer Circular Buffer

In my current application I am receiving spectral data through a spectrometer. This data is accumulated for one second and then put into a circular buffer. For now I have one consumer, who pops entries from the buffer and then saves everything to disk. Ok all of that stuff works. Now what I need to do is add another consumer, who, in parallel to the saving, does some processing with the spectra. So I have two consumers needing the exact same data (note: they only read and don't modify). Ok but this doesn't work because if one of the consumers pops one entry of the buffer it is gone, so the other would not receive it. I guess the simplest solution to this problem is to give every consumer it's own circular buffer. Fine, but the only problem is: the data entries are big. One entry has a maximum size of around 80MB, so in order to save memory it would be great to not have the same data there twice. Is there any better solution?
Note: I am using a circular buffer so it is ensured that the buffer has a growing limit.
Keep two different tail pointers in your buffer, one for each consumer. When the producer is updating the queue, use the farthest tail pointer (the tail pointer which is lagging behind) to check if the buffer is full. Consumers can use their own tail pointers to check if the buffer is empty. This way we get a lockfree buffer, and there is no copying around of data.
See the implementation of disruptor exchange for a discussion about the performance improvement with this solution.
I should hope you're receiving your data directly into the queue and not copying it around much....
Any valid solution that would keep a single copy of the data would have to sync all the consumers so that only when they're all done with an entry it can be popped.
You can keep your circular buffer. You only need a single remover to remove an entry when the readers are done with it. I strongly suggest this remover to be the writer of the data. This way it'd be the only guy with write access to the queue, and that simplifies things.
The remover can be fed from the consumers telling it what are they done with.
Consumers can share their read offsets with the remover. You can use atomic_store on the consumer side, and atomic_load on the remover side.
It should be something like that:
struct Consumer {
long offset = 0;
Consumer() {
void run() {
for(;;) {
entry& e = offset );
process( e );
atomic_store( &offest, offset + e.size() );
struct Remover {
long remove_offset = 0;
std::list<Consumer*> cons;
void remove() {
// find lowest read point
long cons_offset = MAX_LONG;
for( auto p : cons ) {
cons_offset = std::min( cons_offset, atomic_load(&p->offset) );
// remove up to that point
while( cons_offset > remove_offset ) {
entry& e =;
remove_offset += e.size();
q.remove( e.size() );