I need a C++ wrapper class which can read/write/seek data synchronously from a Windows 8/WP8 Storage file (http://msdn.microsoft.com/library/windows/apps/br227171):
class FileWrapper
{
public:
FileWrapper(StorageFile^ file); // IRandomAccessStream or IInputStream
// are fine as input arguments too
byte* readBytes(int bytesToRead, int &bytesGot);
bool writeBytes(byte* data, int size);
bool seek(int position);
}
The data should be read from the file on-the-fly. It should not be cached in memory and the storage file should not be copied into the app's directory where it would be accssible with standard fopen and ifstream functions.
I tried to figure out how to do this (including the Microsoft file access sample code: http://code.msdn.microsoft.com/windowsapps/File-access-sample-d723e597) but I am stuck with the asynchronous access of each operation. Has someone hints how to achieve this? Or is there even built in functionality?
Regards,
Typically you wrap the async operations with the create_task() method and you can wait for the task's execution to complete by calling task.get() to achieve synchronicity, but IIRC this doesn't work for file access because the operations might try to return on the same thread they were executed on and if you're blocking that thread - you end up with a deadlock. I don't have time to try this, but maybe if you start on another thread - you could wait on your thread for completion like this, though it might still deadlock:
auto createTaskTask = create_task([]()
{
return create_task(FileIO::ReadTextAsync(file));
}
auto readFileTask = createTaskTask.get();
try
{
String^ fileContent = readFileTask.get();
}
catch(Exception^ ex)
{
...
}
Related
Suppose I have n processes with IDs 1 to n. I have a file with lots of data, where each process will only store a disjoint subset of the data. I would like to load and process the file using exactly one process, store the resulting data in a data structure allocated via Boost.Interprocess in shared memory, and then allow any (including the one who loaded the file) process to read from the data.
For this to work, I need to make use of some of the Boost.Interprocess synchronization constructs located here to ensure processes do not try to read the data before it has been loaded. However, I am struggling with this part and it is likely due to my lack of experience in this area. At the moment, I have process(1) loading the file into shared memory and I need a way to ensure any given process cannot read the file contents until the load is complete, even if the read might happen arbitrarily long after the loading occurs.
I wanted to try and use a combination of a mutex and condition variable using the notify_all call so that process(1) can signal to the other processes it is okay to read from the shared memory data, but this seems to have an issue in that process(1) might send a notify_all call before some process(i) has even tried to wait for the condition variable to signal it is okay to read the data.
Any ideas for how to approach this in a reliable manner?
Edit 1
Here is my attempt to clarify my dilemma and express more clearly what I have tried. I have some class that I allocate into a shared memory space using Boost.Interprocess that has a form similar to the below:
namespace bi = boost::interprocess;
class cache {
public:
cache() = default;
~cache() = default;
void set_process_id(std::size_t ID) { id = ID; }
void load_file(const std::string& filename) {
// designated process to load
// file has ID equal to 0
if( id == 0 ){
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
// do work to process the file and
// place result in the data variable
// after processing file, notify all other
// processes that they can access the data
load_cond.notify_all();
}
}
void read_into(std::array<double, 100>& data_out) {
{ // wait to read data until load is complete
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
load_cond.wait(lock);
}
data_out = data;
}
private:
size_t id;
std::array<double, 100> data;
bi::interprocess_mutex m;
bi::interprocess_condition load_cond;
};
The above is roughly what I had when I asked the question but did not sit well with me because if the read_into method was called after the designated process executes the notify_all call, then the read_into would be stuck. What I just did this morning that seems to fix this dilemma is change this class to the following:
namespace bi = boost::interprocess;
class cache {
public:
cache():load_is_complete(false){}
~cache() = default;
void set_process_id(std::size_t ID) { id = ID; }
void load_file(const std::string& filename) {
// designated process to load
// file has ID equal to 0
if( id == 0 ){
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
// do work to process the file and
// place result in the data variable
// after processing file, notify all other
// processes that they can access the data
load_is_complete = true;
load_cond.notify_all();
}
}
void read_into(std::array<double, 100>& data_out) {
{ // wait to read data until load is complete
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
if( not load_is_complete ){
load_cond.wait(lock);
}
}
data_out = data;
}
private:
size_t id;
std::array<double, 100> data;
bool load_is_complete;
bi::interprocess_mutex m;
bi::interprocess_condition load_cond;
};
Not sure if the above is the most elegant, but I believe it should ensure processes cannot access the data being stored in shared memory until it has completed loading, whether they get to the mutex m before the designated process or after the designated process has loaded the file contents. If there is a more elegant way, I would like to know.
The typical way is to use named interprocess mutexex. See e.g. the example(s) in Boris Schälings "Boost" Book, which is freely available, also online: https://theboostcpplibraries.com/boost.interprocess-synchronization
If your segment creation is already suitably synchronized, you can use "unnamed" interprocess mutices inside your shared segment, which is usually more efficient and avoids polluting system namespaces with extraneous synchronization primitives.
I have created a class for the serial link with a read function.
I use boost::asio::read for reading data from the serial link. But the read function waits infinite until a byte has been received.
I want to create a thread that stops the read function if the maximum wait time has passed (because there seems a malfunction in the system).
Is it possible to exit a function in C++ from another function? Or cancel the read function call from the other function?
std::string SerialLink::read(const int maxTime) {
std::string data;
std::vector < uint8_t > buf;
const int readSize = 1;
try {
buf.resize(readSize);
//boost::asio::read waits until a byte has been received
boost::asio::read(port_, boost::asio::buffer(buf, readSize));
data = buf.front();
}
catch (const std::exception & e) {
std::cerr << "SerialLink ERROR: " << e.what() << "\n";
return -1;
}
return data();
}
void threadTime() {
//This function will keep track of the time and if maxTime has passed, the read function/function call must be cancelled and return -1 if possible
}
How about you do your reading in a thread (pthread_t thread_read;), then launch the timer in another thread (pthread_t thread_timer;).
After the desired periob, you cancel the reading thread (pthread_cancel(thread_read);)
If port_ is an ordinary file descriptor and you have POSIX available, you might first call select or poll on it (the latter a little easier to use), both provide a timeout facility.
Device and OS specific (you'd have to read documentation), ioctl even might allow you to fetch how much data is available...
Is it possible to exit a function F in C++ from another function G?
No, but you could consider in body of G (called from F) throwing some exception (and catching that exception in F, within the same thread)
Or cancel the read function call
This is operating system specific. On Linux, you might use non-blocking IO (and use poll(2) to detect when input is available, e.g. in your event loop). You could also use asynchronous IO. See aio_read(3) and aio_cancel(3).
I'm implementing my own logging system for performance purposes (and because i basically just need a buffer). What i currently have is something like this:
// category => messages
static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log["192.168.0.1"] << "This is a dynamic entry";
dump_logs();
}
}
void dump_logs() {
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// save the ostringstream to a file
// clear the log
log["info"].str("")
}
}
It works perfectly. However, i've just added threads and i'm not sure if this code is thread-safe. Any tips?
You can make this thread safe by declaring your map thread_local. If you are going to use it across translation units then make it extern and define it in one translation unit, otherwise static is fine.
You will still need to synchronize writing the logs to disk. A mutex should fix that:
// category => messages (one per thread)
thread_local static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log["192.168.0.1"] << "This is a dynamic entry";
dump_logs();
}
}
void dump_logs() {
static std::mutex mtx; // mutex shared between threads
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// now I need to care about threads
// use { to create a lock that will release at the end }
{
std::lock_guard<std::mutex> lock(mtx); // synchronized access
// save the ostringstream to a file
}
// clear the log
log["info"].str("");
}
}
On a POSIX system, if you're always writing data to the end of the file, the fastest way for multiple threads to write data to a file is to use low-level C-style open() in append mode, and just call write(), because the POSIX standard for write() states:
On a regular file or other file capable of seeking, the actual writing
of data shall proceed from the position in the file indicated by the
file offset associated with fildes. Before successful return from
write(), the file offset shall be incremented by the number of bytes
actually written. On a regular file, if the position of the last byte
written is greater than or equal to the length of the file, the length
of the file shall be set to this position plus one.
...
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
So, all write() calls from within a process to a file opened in append mode are atomic.
No mutexes are needed.
Almost. The only issue you have to be concerned with is
If write() is interrupted by a signal after it successfully writes
some data, it shall return the number of bytes written.
If you have enough control over your environment that you can be sure that your calls to write() will not be interrupted by a signal after only a portion of the data is written, this is the fastest way to write data to a file from multiple threads - you're using the OS-provided lock on the file descriptor that ensures adherence to the POSIX-specified behavior, and as long as you generate the data to be written without any locking, that file descriptor lock is the only one in the entire data path. And that lock will be in your data path no matter what you do in your code.
I have a routine that is meant to load and parse data from a file. There is a possibility that the data from the same file might need to be retrieved from two places at once, i.e. during a background caching process and from a user request.
Specifically I am using C++11 thread and mutex libraries. We compile with Visual C++ 11 (aka 2012), so are limited by whatever it lacks.
My naive implementation went something like this:
map<wstring,weak_ptr<DataStruct>> data_cache;
mutex data_cache_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file_path) {
lock_guard<mutex> lock(data_cache_mutex);
auto cache_iter = data_cache.find(file_path);
if (cache_iter != end(data_cache)) {
auto data_ptr = cache_iter->second.lock();
if (data_ptr)
return data_ptr;
// reference died, remove it
data_cache.erase(cache_iter);
}
auto data_ptr = ParseDataFile(file_path);
if (data_ptr)
data_cache.emplace(make_pair(file_path, data_ptr));
return data_ptr;
}
My goals were two-fold:
Allow multiple threads to load separate files concurrently
Ensure that a file is only processed once
The problem with my current approach is that it doesn't allow concurrent parsing of multiple files at all. If I understand what will happen correctly, they're each going to hit the lock and end up processing linearly, one thread at a time. It may change from run to run the order which the threads pass through the lock first, but the end result is the same.
One solution I've considered was to create a second map:
map<wstring,mutex> data_parsing_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
/* etc. */
data_parsing_mutex.erase(file_path);
}
But now I have to be concerned with how data_parsing_mutex is being updated. So I guess I need another mutex?
map<wstring,mutex> data_parsing_mutex;
mutex data_parsing_mutex_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
unique_lock<mutex> super_lock(data_parsing_mutex_mutex);
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
super_lock.unlock();
/* etc. */
super_lock.lock();
data_parsing_mutex.erase(file_path);
}
In fact, looking at this, it's not going to avoid necessarily double-processing a file if it hasn't been completed by the background process when the user requests it, unless I check the cache yet again.
But by now my spidey senses are saying There must be a better way. Is there? Would futures, promises, or atomics help me at all here?
From what you described, it sounds like you're trying to do a form of lazy initialization of the DataStruct using a thread pool, along with a reference counted cache. std::async should be able to provide a lot of the dispatch and synchronization necessary for something like this.
Using std::async, the code would look something like this...
map<wstring,weak_ptr<DataStruct>> cache;
map<wstring,shared_future<shared_ptr<DataStruct>>> pending;
mutex cache_mutex, pending_mutex;
shared_ptr<DataStruct> ParseDataFromFile(wstring file) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file) {
shared_future<weak_ptr<DataStruct>> pf;
shared_ptr<DataStruct> ce;
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (!(ci == cache.end() || ci->second.expired()))
return ci->second.lock();
}
{
lock_guard(pending_mutex);
auto fi = pending.find(file);
if (fi == pending.end() || fi.second.get().expired()) {
pf = async(ParseDataFromFile, file).share();
pending.insert(fi, make_pair(file, pf));
} else {
pf = pi->second;
}
}
pf.wait();
ce = pf.get();
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (ci == cache.end() || ci->second.expired())
cache.insert(ci, make_pair(file, ce));
}
{
lock_guard(pending_mutex);
auto pi = pending.find(file);
if (pi != pending.end())
pending.erase(pi);
}
return ce;
}
This can probably be optimized a bit, but the general idea should be the same.
On a typical computer there is little point in trying to load files concurrently, since disk access will be the bottleneck. Instead, it's better to have a single thread load files (or use asynchronous I/O) and dish out the parsing to a thread pool. Then store the results in a shared container.
Regarding preventing double work, you should consider if this is really necessary. If you are only doing this out of premature optimization, you'd probably make users happier by focussing on making the program responsive, rather than efficient. That is, make sure the user gets what they ask for quickly, even if it means doing double work.
OTOH, if there is a technical reason for not parsing a file twice, you can keep track of the status of each file (loading, parsing, parsed) in the shared container.
I am creating a multiple threads program and several threads may need to call a global function
writeLog(const char* pMsg);
and the writeLog will be implemented something like tihs:
void writeLog(const char* pMsg)
{
CRITICAL_SECTION cs;
// initialize critical section
...
EnterCriticalSection(&cs);
// g_pLogFilePath is a global variable.
FILE *file;
if (0!=fopen_s(&file, g_pLogFilePath, "r+"))
return;
fprintf(file, pMsg);
fclose(file):
LeaveCriticalSection(&cs);
}
My questions are:
1) is it the best way to do concurrent logging? i.e., using critical section.
2) since I will write log in many places in the threads,
and since each log writing will involve open/close file,
does the io will impact the performance significantly?
Thanks!
The best way to do concurrent logging is to use one of the existing log library for C++.
They have many features you would probably like to use (different appenders, formatting, concurrency etc).
If you still want to have your own solution you could probably have something like this:
simple singleton that initialized once and keeps the state (file handler and mutex)
class Log
{
public:
// Singleton
static Log & getLog()
{
static Log theLog;
return theLog;
}
void log(const std::string & message)
{
// synchronous writing here
}
private:
// Hidden ctor
Log()
{
// open file ONCE here
}
// Synchronisation primitive - instance variable
// CRITICAL_SECTION or Boost mutex (preferable)
CRITICAL_SECTION cs_;
// File handler: FILE * or std::ofstream
FILE * handler_;
};
To answer your questions:
Yes, a critical section is indeed needed for concurrent logging.
Yes, logging may indeed affect performance.
As mentioned in the comments, the object used to "protect" the critical section must be accessible by all threads, such as a global variable or singleton.
Regarding the logging performance, IO can be costly. One common approach is to have a logging object that buffers the messages to be logged, and only writes when the buffer is full. This will help with the performance. Additionally, consider having several log levels: DEBUG, INFO, WARNING, ERROR.
A CS is a reasonable way to protect the logging, yes. To avoid inflicting the open/write/close upon every call from every thread, it's common to queue off the string, (if not already malloced/newed, you may need to copy it), to a separate log thread. Blocking disk delays are then buffered from the logging calls. Any lazy-writing etc. optimizations can be implemented in the log thread.
Alternatively, as suggested by the other posters, just use a logging framework that has all this stuff already implemented.
I was writing an answer, then a circuit breaker tripped. Since my answer is still in draft I may as well continue. Much the same as the answer that provides a singleton class, but I do it a little more C-like. This is all in a separate source file (Logging.cpp for example).
static CRITICAL_SECTION csLogMutex;
static FILE *fpFile = NULL;
static bool bInit = false;
bool InitLog( const char *filename )
{
if( bInit ) return false;
bInit = true;
fpFile = fopen( filename, "at" );
InitializeCriticalSection(&csLogMutex);
return fpFile != NULL;
}
void ShutdownLog()
{
if( !bInit ) return;
if( fpFile ) fclose(fpFile);
DeleteCriticalSection(&csLogMutex);
fpFile = NULL;
bInit = false;
}
Those are called in your application entry/exit... As for logging, I prefer to use variable argument lists so I can do printf-style logging.
void writeLog(const char* pMsg, ...)
{
if( fpFile == NULL ) return;
EnterCriticalSection(&csLogMutex);
// You can write a timestamp into the file here if you like.
va_list ap;
va_start(ap, pMsg);
vfprintf( fpFile, pMsg, ap );
fprintf( fpFile, "\n" ); // I hate supplying newlines to log functions!
va_end(ap);
LeaveCriticalSection(&csLogMutex);
}
If you plan to do logging within DLLs, you can't use this static approach. Instead you'll need to open the file with _fsopen and deny read/write sharing.
You may wish to periodically call fflush too, if you expect your application to crash. Or you'll have to call it every time if you want to externally monitor the log in realtime.
Yes, there's a performance impact from critical sections but it's nothing compared to the performance cost of writing to files. You can enter critical sections many thousands of times per second without a worry.