OS level file I/O locks on Qt platforms - c++

Following from this question I have decided to see whether I could implement proper asynchronous file I/O using (multiple) QFiles. The idea is to use a "pool" of QFile objects operating on a single file and dispatch requests via QtConcurrent API to be executed with dedicated QFile object each. After the task would finish the result would be emitted (in case of reads) and QFile object returned to the pool. My initial tests seem to indicate that this is a valid approach and in fact does allow concurrent read/write operations (e.g. read while writing) and also that it can further help with performance (read can finish in between a write).
The obvious issue is reading and writing the same segment of the file. To see what happens I used the above mentioned approach to set up the situation and just let it write and read frantically over the same part of the file. To spot the possible "corruption" I am increasing a number at the beginning of the segment and at the end of it in the writes. The idea being that if the read ever reads different numbers at the start or at the end it can in real situation read corrupted data because it did read partially written data in such a case.
The reads and writes were overlapping a lot so I knew they were happening asynchronously and yet not a single time the output was "wrong". It basically means that the read will never read partially written data. At least on Windows. Using QIODevice::Unbuffered flag did not change it.
I assume that some kind of locking is done on the OS level to prevent this (or caching possibly?), please correct me if this assumption is wrong. I base this on a fact that a read that started after write started could finish before a write finished. Since I plan to deploy the application on other platforms as well I was wondering whether I can count on this being the case for all platforms supported by Qt (mainly those based on POSIX and Android) or I need to actually implement a locking mechanism myself for these situations - to defer reading from a segment that is being written to.

There's nothing in the implementation of QFile that guarantees atomicity of writes. So the idea of using multiple QFile objects to access the same sections of the same underlying file won't ever work right. Your tests on Windows are not indicative of there not being a problem, they are merely insufficient: had they been sufficient, they'd have produced the problem you're expecting.
For highly performant file access in small, possibly overlapping chunks, you have to:
Map the file to memory.
Serialize access to the memory, perhaps using multiple mutexes to improve concurrency.
Access memory concurrently, and don't hold the mutex while the data is paged in.
This is done by first prefetching - either reading from every page in the range of bytes to be accessed, and discarding the results, or using a platform-specific API. Then you lock the mutex and copy the data either out of the file or into it.
The OS does the rest.
class FileAccess : public QObject {
Q_OBJECT
QFile m_file;
QMutex m_mutex;
uchar * m_area = nullptr;
void prefetch(qint64 pos, qint64 size);
public:
FileAccess(const QString & name) : m_file{name} {}
bool open() {
if (m_file.open(QIODevice::ReadWrite)) {
m_area = m_file.map(0, m_file.size());
if (! m_area) m_file.close();
}
return m_area != nullptr;
}
void readReq(qint64 pos, qint64 size);
Q_SIGNAL readInd(const QByteArray & data, qint64 pos);
void write(const QByteArray & data, qint64 pos);
};
void FileAccess:prefetch(qint64 pos, qint64 size) {
const qint64 pageSize = 4096;
const qint64 pageMask = ~pageSize;
for (qint64 offset = pos & pageMask; offset < size; offset += pageSize) {
volatile uchar * p = m_area+offset;
(void)(*p);
}
}
void FileAccess:readReq(qint64 pos, qint64 size) {
QtConcurrent::run([=]{
QByteArray result{size, Qt::Uninitialized};
prefetch(pos, size);
QMutexLocker lock{&m_mutex};
memcpy(result.data(), m_area+pos, result.size());
lock.unlock();
emit readInd(result, pos);
});
}
void FileAccess::write(const QByteArray & data, qint64 pos) {
QtConcurrent::run([=]{
prefetch(pos, data.size());
QMutexLocker lock{&m_mutex};
memcpy(m_area+pos, data.constData(), data.size());
});
}

Related

Waiting Thread or Create New Thread?

I have a decision to make regarding the way I code something, which is running on an embedded platform and am hoping there is a general "rule-of-thumb" that can be used in this case. Coding both my ideas and then benchmarking would obviously be the best way to go, but to get any meaningful, or rather accurate results out of this platform, in my particular case, would be quite tricky. I'm also sure that there may be others that are having the same question on their respective platforms, so I decided to ask it here. Please be kind, as I'm not very familiar with the threading library, so constructive feedback would be useful.
I have many threads (well, about 10-20 at maximum) all wanting to write to this hardware device. So I decided on using a simple ring-buffer consisting of 2 buffers (primary/secondary) of 8k each. This way each in-coming thread could be dealt with in a timely fashion. An arriving thread would obtain a mutex and write into the primary buffer and then release its mutex ready for the next thread. Now when the primary buffer is full, new incoming threads obviously switch to using the secondary buffer and then you start to write the primary buffer to the hardware device.
So the question really is... How best to write to the hardware device??? I'm thinking that there are two choices:
As soon as the buffer is full, create a new thread that does the write operation.
Signal a pre-created waiting worker-thread to do the write operation.
Both of the options seem to come with their respective pros/cons. Option 1 is the simplest to code and there are a number of ways to do this, but its effectiveness is dependent on how expensive it is to create/start the thread. The thread would be created, it would perform the write operation and then it would die. Option 2 however seems to be the most performant, but if you're going to have a reusable thread, you're going to need a mutex and a couple condition variables to control it. One to notify the thread that data is ready and another to ask for the thread to terminate when the program ends. Add to that a sprinkle of atomics for spurious wake-ups/missing notifications etc, and you've got quite an intricate solution to get right.
So what is the best method here? Are threads in general heavy to create/start or is this something that is completely platform dependent and benchmarking is the only way to know? Is there any benefit to using one method over the other that I've not thought about?
-- This is for the people not suffering from TL;DR syndrome --
I'm sure some of you have already wondered what happens if the secondary buffer becomes full before the write operation has finished? The answer in my case is fairly simple: this should never happen! Although the write operation is slow, it would never be slow enough such that the secondary buffer is filled before the write is complete. However, if someone is going to use this ring-buffer method, they must be prepared for this contingency. The way I thought about tackling this is to have a second mutex that is held during the write operation. This would mean that the thread that was due to write to the buffer would block until the write completed and the mutex was released.
Here's what I roughly ended up with after going with Option 2, but it seems awfully messy. I actually wanted to use promise/futures to avoid the spin-lock predicates on the condition variable, but couldn't think of a good way of moving a promise to an already created thread. Anyway... nice feedback is appreciated, bad-feedback, well, I'm not overly familiar with the threading library.
class Bar
{
public:
Bar(const size_t size) : buffer(new uint8_t[size]), buffer_size(size), used_size(0) {}
const size_t GetRemainingBufferSize(void) const { return buffer_size - used_size; }
const size_t GetUsedBufferSize(void) const { return used_size; }
const uint8_t* GetBuffer(void) const { return buffer.get(); }
const size_t GetBufferSize() const { return buffer_size; }
void ResetBuffer(void) { used_size = 0; }
void WriteIntoBuffer(const vector<uint8_t>& data)
{
std::copy(data.begin(), data.end(), buffer.get() + used_size);
used_size += data.size();
}
private:
std::unique_ptr<uint8_t[]> buffer;
size_t buffer_size;
size_t used_size;
};
class Foo
{
public:
Foo(const size_t buffer_size = 8192) : bar_buffers{ buffer_size, buffer_size }, primary_buffer(&bar_buffers[0]), secondary_buffer(&bar_buffers[1]),
write_predicate(false), quit_predicate(false), write_buffer(primary_buffer)
{
foo_thread = std::thread(&Foo::WriteHWThread, this);
}
~Foo()
{
quit_predicate = true;
begin_write.notify_one();
if (foo_thread.joinable())
foo_thread.join();
}
Foo(const Foo&) = delete;
Foo& operator=(const Foo&) = delete;
void WriteData(const std::vector<uint8_t>& data)
{
if (std::lock_guard<std::mutex> foo_lk(foo_lock); primary_buffer->GetRemainingBufferSize() < data.size())
{
std::unique_lock<std::mutex> write_lk(write_lock);
write_buffer = primary_buffer;
write_lk.unlock();
std::swap(primary_buffer, secondary_buffer);
primary_buffer->ResetBuffer();
write_predicate = true;
begin_write.notify_one();
}
primary_buffer->WriteIntoBuffer(data);
}
void WriteHWThread(void)
{
do
{
std::unique_lock<std::mutex> write_lk(write_lock);
begin_write.wait(write_lk, [&]() -> bool { return write_predicate.load() || quit_predicate.load(); });
write_predicate = false;
if (write_buffer.load()->GetUsedBufferSize())
<<< WRITE TO DEDICATED HARDWARE >>>
write_lk.unlock();
} while (!quit_predicate);
}
private:
Bar bar_buffers[2];
Bar* primary_buffer, *secondary_buffer;
std::atomic<bool> write_predicate, quit_predicate;
std::atomic<Bar*> write_buffer;
std::mutex foo_lock, write_lock;
std::thread foo_thread;
std::condition_variable begin_write;
};

multi producer, single consumer queue with char[]

My project's code base current contains a multi-producer, single consumer data writer. It uses 2 circular buffers and a couple of CRITICALSECTION locks to write variably sized streams of data passed in as char* with a byte count. It takes data from dozens+ of threads writing char* and writes them to a binary file.
I'm looking to improve this design, but it seems that all the implementations that are written on the net involve simple primitive types as opposed to writing char arrays. Can anyone suggest links for me to research, as my google-fu is pretty weak?
My test implementation is essentially:
Record(char* data, uint32_t byte_count)
{
c_section.Lock(); //CCriticalSection, std::mutex, whatever
recording_buffer.write(data, byte_count); //in house circular buff
c_section.Unlock();
}
FileThread()
{
while(run_thread)
{
if(recording_buffer.size() > 0)
{
c_section.Lock();
//read the entire size of the filled bytes of the buffer
//multiple entries are written to the file at once.
write_to_file(recording_buffer.read(recording_buffer.size()), recording_buffer.size();
c_section.Unlock();
}
std::this_thread::sleep_for(10ms);
}
}

Concurrently processing data. What do I need to watch out for?

I have a routine that is meant to load and parse data from a file. There is a possibility that the data from the same file might need to be retrieved from two places at once, i.e. during a background caching process and from a user request.
Specifically I am using C++11 thread and mutex libraries. We compile with Visual C++ 11 (aka 2012), so are limited by whatever it lacks.
My naive implementation went something like this:
map<wstring,weak_ptr<DataStruct>> data_cache;
mutex data_cache_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file_path) {
lock_guard<mutex> lock(data_cache_mutex);
auto cache_iter = data_cache.find(file_path);
if (cache_iter != end(data_cache)) {
auto data_ptr = cache_iter->second.lock();
if (data_ptr)
return data_ptr;
// reference died, remove it
data_cache.erase(cache_iter);
}
auto data_ptr = ParseDataFile(file_path);
if (data_ptr)
data_cache.emplace(make_pair(file_path, data_ptr));
return data_ptr;
}
My goals were two-fold:
Allow multiple threads to load separate files concurrently
Ensure that a file is only processed once
The problem with my current approach is that it doesn't allow concurrent parsing of multiple files at all. If I understand what will happen correctly, they're each going to hit the lock and end up processing linearly, one thread at a time. It may change from run to run the order which the threads pass through the lock first, but the end result is the same.
One solution I've considered was to create a second map:
map<wstring,mutex> data_parsing_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
/* etc. */
data_parsing_mutex.erase(file_path);
}
But now I have to be concerned with how data_parsing_mutex is being updated. So I guess I need another mutex?
map<wstring,mutex> data_parsing_mutex;
mutex data_parsing_mutex_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
unique_lock<mutex> super_lock(data_parsing_mutex_mutex);
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
super_lock.unlock();
/* etc. */
super_lock.lock();
data_parsing_mutex.erase(file_path);
}
In fact, looking at this, it's not going to avoid necessarily double-processing a file if it hasn't been completed by the background process when the user requests it, unless I check the cache yet again.
But by now my spidey senses are saying There must be a better way. Is there? Would futures, promises, or atomics help me at all here?
From what you described, it sounds like you're trying to do a form of lazy initialization of the DataStruct using a thread pool, along with a reference counted cache. std::async should be able to provide a lot of the dispatch and synchronization necessary for something like this.
Using std::async, the code would look something like this...
map<wstring,weak_ptr<DataStruct>> cache;
map<wstring,shared_future<shared_ptr<DataStruct>>> pending;
mutex cache_mutex, pending_mutex;
shared_ptr<DataStruct> ParseDataFromFile(wstring file) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file) {
shared_future<weak_ptr<DataStruct>> pf;
shared_ptr<DataStruct> ce;
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (!(ci == cache.end() || ci->second.expired()))
return ci->second.lock();
}
{
lock_guard(pending_mutex);
auto fi = pending.find(file);
if (fi == pending.end() || fi.second.get().expired()) {
pf = async(ParseDataFromFile, file).share();
pending.insert(fi, make_pair(file, pf));
} else {
pf = pi->second;
}
}
pf.wait();
ce = pf.get();
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (ci == cache.end() || ci->second.expired())
cache.insert(ci, make_pair(file, ce));
}
{
lock_guard(pending_mutex);
auto pi = pending.find(file);
if (pi != pending.end())
pending.erase(pi);
}
return ce;
}
This can probably be optimized a bit, but the general idea should be the same.
On a typical computer there is little point in trying to load files concurrently, since disk access will be the bottleneck. Instead, it's better to have a single thread load files (or use asynchronous I/O) and dish out the parsing to a thread pool. Then store the results in a shared container.
Regarding preventing double work, you should consider if this is really necessary. If you are only doing this out of premature optimization, you'd probably make users happier by focussing on making the program responsive, rather than efficient. That is, make sure the user gets what they ask for quickly, even if it means doing double work.
OTOH, if there is a technical reason for not parsing a file twice, you can keep track of the status of each file (loading, parsing, parsed) in the shared container.

Qt threads for an desktop interface

I am developing a Qt interface for a 3D printer. When I cilck the Print button (the printer starts printing) the interface crashes. I am using this code:
*future= QtConcurrent::run(Imprimir,filename.toUtf8().data());
What can I use to solve it?? What types of threads can I use????
I need to use the interface while the printer is printing (it may take several minutes).
Thank you for advance.
Edit:
Imprimir function:
int Imprimir(char *fich)
{
char *aux = new char;
FILE *f;
f=fopen(fich, "r");
while(!feof(f)){
fgets(aux, 200, f);
Enviar(aux);
while(!seguir_imprimiendo);
}
Sleep(7000);
return 0;
}
It's making life harder than necessary by not using QFile. When you use QFile, you don't have to deal with silly things like passing C-string filenames around. You're likely to do it wrong, since who's to guarantee that the platform expects them to be encoded in UTF-8. The whole point of Qt is that it helps you avoid such issues. They are taken care of, the code is tested on multiple platforms to ensure that the behavior is correct in each case.
By not using QByteArray and QFile, you're liable to commit silly mistakes like your C-classic bug of allocating a single character buffer and then pretending that it's 200 characters long.
I see no reason to sleep in that method. It also makes no sense to wait for the continue flag seguir_imprimiendo to change, since Enviar runs in the same thread. It should block until the data is sent.
I presume that you've made Enviar run its code through QtConcurrent::run, too. This is unnecessary and leads to a deadlock. Think of what happens if a free thread can never be available while Imprimir is running. It's valid for the pool Imprimir runs on to be limited to just one thread. You can't simply pretend that it can't happen.
bool Imprimir(const QString & fileName)
{
QFile src(fileName);
if (! src.open(QIODevice::ReadOnly)) return false;
QByteArray chunk;
do {
chunk.resize(4096);
qint64 read = src.read(chunk.data(), chunk.size());
if (read < 0) return false;
if (read == 0) break; //we're done
chunk.resize(read);
if (!Enviar(chunk)) return false;
} while (! src.atEnd());
return true;
}
bool Enviar(const QByteArray & data)
{
...
return true; // if successful
}
Assuming there's no problem with Imprimir, the issue is probably with filename.toUtf8().data(). The data pointer you get from this function is only valid while filename is in-scope. When filename goes out of scope, the data may be deleted and any code accessing the data will crash.
You should change the Imprimir function to accept a QString parameter instead of char* to be safe.
If you can't change the Imprimir function (because it's in another library, for example), then you will have to wrap it in your own function which accepts a QString. If you're using C++11, you can use a lambda expression to do the job:
QtConcurrent::run([](QString filename) {
Imprimir(filename.toUtf8().data());
}, filename);
If not, you will have to write a separate ImprimirWrapper(QString filename) function and invoke it using QtConcurrent::run.

Concurrent log file access in C/C++

I am creating a multiple threads program and several threads may need to call a global function
writeLog(const char* pMsg);
and the writeLog will be implemented something like tihs:
void writeLog(const char* pMsg)
{
CRITICAL_SECTION cs;
// initialize critical section
...
EnterCriticalSection(&cs);
// g_pLogFilePath is a global variable.
FILE *file;
if (0!=fopen_s(&file, g_pLogFilePath, "r+"))
return;
fprintf(file, pMsg);
fclose(file):
LeaveCriticalSection(&cs);
}
My questions are:
1) is it the best way to do concurrent logging? i.e., using critical section.
2) since I will write log in many places in the threads,
and since each log writing will involve open/close file,
does the io will impact the performance significantly?
Thanks!
The best way to do concurrent logging is to use one of the existing log library for C++.
They have many features you would probably like to use (different appenders, formatting, concurrency etc).
If you still want to have your own solution you could probably have something like this:
simple singleton that initialized once and keeps the state (file handler and mutex)
class Log
{
public:
// Singleton
static Log & getLog()
{
static Log theLog;
return theLog;
}
void log(const std::string & message)
{
// synchronous writing here
}
private:
// Hidden ctor
Log()
{
// open file ONCE here
}
// Synchronisation primitive - instance variable
// CRITICAL_SECTION or Boost mutex (preferable)
CRITICAL_SECTION cs_;
// File handler: FILE * or std::ofstream
FILE * handler_;
};
To answer your questions:
Yes, a critical section is indeed needed for concurrent logging.
Yes, logging may indeed affect performance.
As mentioned in the comments, the object used to "protect" the critical section must be accessible by all threads, such as a global variable or singleton.
Regarding the logging performance, IO can be costly. One common approach is to have a logging object that buffers the messages to be logged, and only writes when the buffer is full. This will help with the performance. Additionally, consider having several log levels: DEBUG, INFO, WARNING, ERROR.
A CS is a reasonable way to protect the logging, yes. To avoid inflicting the open/write/close upon every call from every thread, it's common to queue off the string, (if not already malloced/newed, you may need to copy it), to a separate log thread. Blocking disk delays are then buffered from the logging calls. Any lazy-writing etc. optimizations can be implemented in the log thread.
Alternatively, as suggested by the other posters, just use a logging framework that has all this stuff already implemented.
I was writing an answer, then a circuit breaker tripped. Since my answer is still in draft I may as well continue. Much the same as the answer that provides a singleton class, but I do it a little more C-like. This is all in a separate source file (Logging.cpp for example).
static CRITICAL_SECTION csLogMutex;
static FILE *fpFile = NULL;
static bool bInit = false;
bool InitLog( const char *filename )
{
if( bInit ) return false;
bInit = true;
fpFile = fopen( filename, "at" );
InitializeCriticalSection(&csLogMutex);
return fpFile != NULL;
}
void ShutdownLog()
{
if( !bInit ) return;
if( fpFile ) fclose(fpFile);
DeleteCriticalSection(&csLogMutex);
fpFile = NULL;
bInit = false;
}
Those are called in your application entry/exit... As for logging, I prefer to use variable argument lists so I can do printf-style logging.
void writeLog(const char* pMsg, ...)
{
if( fpFile == NULL ) return;
EnterCriticalSection(&csLogMutex);
// You can write a timestamp into the file here if you like.
va_list ap;
va_start(ap, pMsg);
vfprintf( fpFile, pMsg, ap );
fprintf( fpFile, "\n" ); // I hate supplying newlines to log functions!
va_end(ap);
LeaveCriticalSection(&csLogMutex);
}
If you plan to do logging within DLLs, you can't use this static approach. Instead you'll need to open the file with _fsopen and deny read/write sharing.
You may wish to periodically call fflush too, if you expect your application to crash. Or you'll have to call it every time if you want to externally monitor the log in realtime.
Yes, there's a performance impact from critical sections but it's nothing compared to the performance cost of writing to files. You can enter critical sections many thousands of times per second without a worry.