Asynchronously writing to a file in c++ unix

Asynchronously writing to a file in c++ unix - c++

I have some long loop that I need to write some data to a file on every iteration. The problem is that writing to a file can be slow, so I would like to reduce the time this takes by doing the writing asynchronously.
Does anyone know a good way to do this? Should I be creating a thread that consumes whatever is put into it's buffer by writing it out ( in this case, a single producer, single consumer )?
I am interested mostly in solutions that don't involve anything but the standard library (C++11).

Before going into asynchronous writing, if you are using IOStreams you might want to try to avoid flushing the stream accidentally, e.g., by not using std::endl but rather using '\n' instead. Since writing to IOStreams is buffered this can improve performance quite a bit.
If that's not sufficient, the next question is how the data is written. If there is a lot of formatting going on, there is a chance that the actual formatting takes most of the time. You might be able to push the formatting off into a separate thread but that's quite different from merely passing off writing a couple of bytes to another thread: you'd need to pass on a suitable data structure holding the data to be formatted. What is suitable depends on what you are actually writing, though.
Finally, if writing the buffers to a file is really the bottleneck and you want to stick with the standard C++ library, it may be reasonable to have a writer thread which listens on a queue filled with buffers from a suitable stream buffer and writes the buffers to an std::ofstream: the producer interface would be an std::ostream which would send off probably fixed sized buffers either when the buffer is full or when the stream is flushed (for which I'd use std::flush explicitly) to a queue on which the other read listens. Below is a quick implementation of that idea using only standard library facilities:
#include <condition_variable>
#include <fstream>
#include <mutex>
#include <queue>
#include <streambuf>
#include <string>
#include <thread>
#include <vector>
struct async_buf
: std::streambuf
{
std::ofstream out;
std::mutex mutex;
std::condition_variable condition;
std::queue<std::vector<char>> queue;
std::vector<char> buffer;
bool done;
std::thread thread;
void worker() {
bool local_done(false);
std::vector<char> buf;
while (!local_done) {
{
std::unique_lock<std::mutex> guard(this->mutex);
this->condition.wait(guard,
[this](){ return !this->queue.empty()
|| this->done; });
if (!this->queue.empty()) {
buf.swap(queue.front());
queue.pop();
}
local_done = this->queue.empty() && this->done;
}
if (!buf.empty()) {
out.write(buf.data(), std::streamsize(buf.size()));
buf.clear();
}
}
out.flush();
}
public:
async_buf(std::string const& name)
: out(name)
, buffer(128)
, done(false)
, thread(&async_buf::worker, this) {
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size() - 1);
}
~async_buf() {
std::unique_lock<std::mutex>(this->mutex), (this->done = true);
this->condition.notify_one();
this->thread.join();
}
int overflow(int c) {
if (c != std::char_traits<char>::eof()) {
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}
return this->sync() != -1
? std::char_traits<char>::not_eof(c): std::char_traits<char>::eof();
}
int sync() {
if (this->pbase() != this->pptr()) {
this->buffer.resize(std::size_t(this->pptr() - this->pbase()));
{
std::unique_lock<std::mutex> guard(this->mutex);
this->queue.push(std::move(this->buffer));
}
this->condition.notify_one();
this->buffer = std::vector<char>(128);
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size() - 1);
}
return 0;
}
};
int main()
{
async_buf sbuf("async.out");
std::ostream astream(&sbuf);
std::ifstream in("async_stream.cpp");
for (std::string line; std::getline(in, line); ) {
astream << line << '\n' << std::flush;
}
}

Search the web for "double buffering."
In general, one thread will write to one or more buffers. Another thread reads from the buffers, "chasing" the writing thread.
This may not make your program more efficient. Efficiency with files is achieved by writing in huge blocks so that the drive doesn't get a chance to spin down. One write of many bytes is more efficient than many writes of a few bytes.
This could be achieved by having the writing thread only write when the buffer content has exceeded some threshold like 1k.
Also research the topic of "spooling" or "print spooling".
You'll need to use C++11 since previous versions don't have threading support in the standard library. I don't know why you limit yourself, since Boost has some good stuff in it.

Related

`boost::asio::io_context` with `boost::process::async_pipe`: is there a way to run it reliably?

With Windows API, we can use pipes to forward process output and error streams, so we can read process output without any temporary files. Instead of this:
std::system("my_command.exe > out.tmp");
we can work much faster and without risk to generate a lot of forgotten temporary files (on system crash for example).
Linux has similar functionality. However, implementation of OS-specific code for each OS is time consuming and complex task, so it looks like good idea to use some portable solution.
boost::process claims to be such solution. However, it is fundamentally unreliable. See following sample program:
#include <fstream>
#include <iostream>
#include <memory>
#include <vector>
#include <boost/asio.hpp>
#include <boost/process.hpp>
void ReadPipe(boost::process::async_pipe& pipe, char* output_buffer, size_t output_buffer_size, boost::asio::io_context::strand& executor, std::ofstream& output_saver)
{
namespace io = boost::asio;
using namespace std;
io::async_read(
pipe,
io::buffer(
output_buffer,
output_buffer_size
),
io::bind_executor(executor, [&pipe, output_buffer, output_buffer_size, &executor, &output_saver](const boost::system::error_code& error, std::size_t bytes_transferred) mutable
{
// Save transferred data
if (bytes_transferred)
output_saver.write(output_buffer, bytes_transferred);
// Handle error
if (error)
{
if (error.value() == boost::asio::error::basic_errors::broken_pipe)
cout << "Child standard output is broken, so the process is most probably exited." << endl;
else
cout << "Child standard output read error occurred. " << boost::system::system_error(error).what() << endl;
}
else
{
//this_thread::sleep_for(chrono::milliseconds(50));
ReadPipe(pipe, output_buffer, output_buffer_size, executor, output_saver);
}
})
);
}
int main(void)
{
namespace io = boost::asio;
namespace bp = boost::process;
using namespace std;
// Initialize
io::io_context asio_context;
io::io_context::strand executor(asio_context);
bp::async_pipe process_out(asio_context);
char buffer[65535];
constexpr const size_t buffer_size = sizeof(buffer);
ofstream output_saver(R"__(c:\screen.png)__", ios_base::out | ios_base::binary | ios_base::trunc);
// Schedule to read standard output
ReadPipe(process_out, buffer, buffer_size, executor, output_saver);
// Run child
bp::child process(
bp::search_path("adb"),
bp::args({ "exec-out", "screencap", "-p" }),
bp::std_in.close(),
bp::std_out > process_out,
bp::std_err > process_out
);
asio_context.run();
process.wait();
output_saver.close();
// Finish
return 0;
}
This code works nice; it runs ADB, generates Android device screenshot and saves it with asynchronous pipe, so no temporary files are involved. This specific example saves the screenshot as a file, but in real application you can save data in memory, load and parse it.
I use ADB in my sample, because this tool gives good example of data that generated comparably slowly and that sent via USB or Wi-Fi (so also slowly), and the data size is comparably big (for full HD device with complex image the PNG file will be 1M+).
When I uncomment following line:
this_thread::sleep_for(chrono::milliseconds(50));
The pipe reading operation becomes completely unreliable. The program reads only part of data (of unpredictable size).
So, even so short delay as 50 milliseconds forces boost implementation of asynchronous pipe to fail.
It is not normal situation. What if CPU usage is near 100% (i. e. we are on highly loaded server)? What if the thread runs other ASIO jobs that may execute during 50 milliseconds or less? So, it is just easily reproducible implementation of fundamental boost ASIO bug: asynchronous pipe can not tolerate any delays when you started to read it; you have to call async_read again instantly after you received data, otherwise you are at risk to loose your data.
In practice when I use the same ASIO context to run multiple jobs (not only one async_read that reads process standard output), async_pipe fails in 50% of attempts to read 1M of data or more.
Does anyone know a workaround how to make async_pipe reliable and not to break connection if ASIO context runs async_read with very small delays required to run other jobs?

Is there a way to wait for a file to be written to in pure c++? [duplicate]

This question already has answers here:
How do I make my program watch for file modification in C++?
(6 answers)
Closed 4 years ago.
I am looking to write some code to monitor a file. When it gets written to I would like to read the new lines and act upon them.
So I found this thread: how-to-read-a-growing-text-file-in-c and it shows me how to do this.
However, its a bit of a "polling" approach. Here is the code snippet for convenience. Note: this is not my work (its the answer from the link):
#include <iostream>
#include <string>
#include <fstream>
int main()
{
std::ifstream ifs("test.log");
if (ifs.is_open())
{
std::string line;
while (true)
{
while (std::getline(ifs, line)) std::cout << line << "\n";
if (!ifs.eof()) break; // Ensure end of read was EOF.
ifs.clear();
// You may want a sleep in here to avoid
// being a CPU hog.
}
}
return 0;
}
You can see there is the comment: You may want a sleep in here to avoid being a CPU hog.
Is there a way (and there might not be) to wait for the file to be written to, such that some event/condition triggers our thread to wake up? I am thinking along the lines of select() like function... But I would really like it to be pure c++.
Failing that - is there a non-pure c++ way (for me I require it to work for Linux OS and possibly windows as well)?
I have not written any code yet because I am not even sure where the best place to start is.

you just need to add sleep function that works on both Win and Linux, thus you can use it std::this_thread::sleep_for(std::chrono::milliseconds(500));
in your code. it's from std library so you can use it on Linux or Windows.
#include <chrono>
#include <thread>
#include <iostream>
#include <string>
#include <fstream>
int main()
{
std::ifstream ifs("test.log");
if (ifs.is_open())
{
std::string line;
while (true)
{
while (std::getline(ifs, line)) std::cout << line << "\n";
if (!ifs.eof()) break; // Ensure end of read was EOF.
ifs.clear();
// You may want a sleep in here to avoid
// being a CPU hog.
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
return 0;
}

How to open file in exclusive mode in C++

I am implementing some file system in C++. Up to now I was using fstream but I realized that it is impossible to open it in exclusive mode. Since there are many threads I want to allow multiple reads, and when opening file in writing mode I want to open the file in exclusive mode?
What is the best way to do it? I think Boost offers some features. And is there any other possibility? I would also like to see simple example. If it is not easy / good to do in C++ I could write in C as well.
I am using Windows.

On many operating systems, it's simply impossible, so C++
doesn't support it. You'll have to write your own streambuf.
If the only platform you're worried about is Windows, you can
possibly use the exclusive mode for opening that it offers.
More likely, however, you would want to use some sort of file
locking, which is more precise, and is available on most, if not
all platforms (but not portably—you'll need LockFileEx
under Windows, fcntl under Unix).
Under Posix, you could also use pthread_rwlock. Butenhof
gives an implementation of this using classical mutex and
condition variables, which are present in C++11, so you could
actually implement a portable version (provided all of the
readers and writers are in the same process—the Posix
requests will work across process boundaries, but this is not
true for the C++ threading primitives).

if your app only works on Windows, the Win32 API function CreateFile() is your choice.
For example:
HANDLE hFile = ::CreateFileW(lpszFileFullPathName, GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL);

If you are open to using boost, then I would suggest you use the file_lock class. This means you want to keep the filename of the files you open/close because fstream does not do so for you.
They have two modes lock() that you can use for writing (i.e. only one such lock at a time, the sharable lock prevents this lock too) and lock_sharable() that you can use for reading (i.e. any number of threads can obtain such a lock).
Note that you will find it eventually complicated to manage both, read and write, in this way. That is, if there is always someone to read, the sharable lock may never get released. In that case, the exclusive lock will never be given a chance to take....
// add the lock in your class
#include <boost/interprocess/sync/file_lock.hpp>
class my_files
{
...
private:
...
boost::file_lock m_lock;
};
Now when you want to access a file, you can lock it one way or the other. If the thread is in charge of when they do that, you could add functions for the user to have access to the lock. If your implementation of the read and write functions in my_files are in charge, you want to get a stack based object that locks and unlocks for you (RAII):
class safe_exclusive_lock
{
public:
safe_exclusive_lock(file_lock & lock)
: m_lock_ref(lock)
{
m_lock_ref.lock();
}
~safe_exclusive_lock()
{
m_lock_ref.unlock();
}
private:
file_lock & m_lock_ref;
};
Now you can safely lock the file (i.e. you lock, do things that may throw, you always unlock before exiting your current {}-block):
ssize_t my_files::read(char *buf, size_t len)
{
safe_exclusive_lock guard(m_lock);
...your read code here...
return len;
} // <- here we get the unlock()
ssize_t my_files::write(char const *buf, size_t len)
{
safe_exclusive_lock guard(m_lock);
...your write code here...
return len;
} // <- here we get the unlock()
The file_lock uses a file, so you will want to have the fstream file already created whenever the file_lock is created. If the fstream file may not be created in your constructor, you probably will want to transform the m_lock variable in a unique pointer:
private:
std::unique_ptr<file_lock> m_lock;
And when you reference it, you now need an asterisk:
safe_exclusive_lock guard(*m_lock);
Note that for safety, you should check whether the pointer is indeed allocated, if not defined, it means the file is not open yet so I would suggest you throw:
if(m_lock)
{
safe_exclusive_lock guard(*m_lock);
...do work here...
}
else
{
throw file_not_open();
}
// here the lock was released so you cannot touch the file anymore
In the open, you create the lock:
bool open(std::string const & filename)
{
m_stream.open(...);
...make sure it worked...
m_lock.reset(new file_lock(filename));
// TODO: you may want a try/catch around the m_lock and
// close the m_stream if it fails or use a local
// variable and swap() on success...
return true;
}
And do not forget to release the lock object in your close:
void close()
{
m_lock.reset();
}

Well you can manually prevent yourself from opening a file if it has been opened in write mode already. Just keep track internally of which files you've opened in write mode.
Perhaps you could hash the filename and store it in a table upon open with write access. This would allow fast lookup to see if a file has been opened or not.

You could rename the file, update it under the new name, and rename it back. I've done it, but it's a little heavy.

Since C++17 there are two options:
In C++23 by using the openmode std::ios::noreplace.
In C++17 by using the std::fopen mode x (exclusive).
Note: The x mode was added to c in C11.
C++23 and later:
#include <cerrno>
#include <cstring>
#include <fstream>
#include <iostream>
int main() {
std::ofstream ofs("the_file", std::ios::noreplace);
if (ofs) {
std::cout << "success\n";
} else {
std::cerr << "Error: " << std::strerror(errno) << '\n';
}
}
Demo
C++17 and later:
#include <cerrno>
#include <cstdio>
#include <cstring>
#include <fstream>
#include <iostream>
#include <memory>
struct FILE_closer {
void operator()(std::FILE* fp) const { std::fclose(fp); }
};
// you may want overloads for `std::filesystem::path`, `std::string` etc too:
std::ofstream open_exclusively(const char* filename) {
bool excl = [filename] {
std::unique_ptr<std::FILE, FILE_closer> fp(std::fopen(filename, "wx"));
return !!fp;
}();
auto saveerr = errno;
std::ofstream stream;
if (excl) {
stream.open(filename);
} else {
stream.setstate(std::ios::failbit);
errno = saveerr;
}
return stream;
}
int main() {
std::ofstream ofs = open_exclusively("the_file");
if (ofs) {
std::cout << "success\n";
} else {
std::cout << "Error: " << std::strerror(errno) << '\n';
}
}
Demo

Boost ASIO - How to write a console server 2

I'm trying to write a game server to run on Ubuntu Server (No GUI), and I'm having problems right at step 1. I'm new to C++, so please bear with me.
I need to be able to type commands to the server at any given point while it continues running. Since cin is a blocking input, that won't fly. I've dug around and it seems the way to go is to use Boost's ASIO library.
This answer comes incredibly close to fulfilling my needs, but I still need to know two more things:
1: The "command" passed from input seems to be limited to 1 char at a time. I need MUCH more than single key inputs, eg "shutdown", "say 'Hello World!'", "listPlayers -online", etc. I tried adapting the code to use string, instead of char:
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/enable_shared_from_this.hpp>
#include <boost/shared_ptr.hpp>
#include <iostream>
#include <string>
using namespace boost::asio;
class Input : public boost::enable_shared_from_this<Input>
{
public:
typedef boost::shared_ptr<Input> Ptr;
public:
static void create(
io_service& io_service
)
{
Ptr input(
new Input( io_service )
);
input->read();
}
private:
explicit Input(
io_service& io_service
) :
_input( io_service )
{
_input.assign( STDIN_FILENO );
}
void read()
{
async_read(
_input,
boost::asio::buffer( &_command, sizeof(_command) ),
boost::bind(
&Input::read_handler,
shared_from_this(),
placeholders::error,
placeholders::bytes_transferred
));
}
void read_handler(
const boost::system::error_code& error,
size_t bytes_transferred
)
{
if ( error ) {
std::cerr << "read error: " << boost::system::system_error(error).what() << std::endl;
return;
}
if ( _command.compare( "\n" ) != 0 ) {
std::cout << "command: " << _command << std::endl;
}
this->read();
}
private:
posix::stream_descriptor _input;
std::string _command;
};
int main()
{
io_service io_service;
Input::create( io_service );
io_service.run();
}
However, this causes a segmentation error after a few characters of input, and pressing enter after entering any input no longer causes "command: " to appear. Is there a way to have this setup use string? I'm sure appending them to a separate string one character at a time will work, but I'd like to think this setup would work natively with entire strings.
2: (Edited for clarification) I need this non-blocking input to work in tandem with the rest of my server code. The question is: where does that code go? I call your attention to the main() function from above, modified to use a while loop, and call a mainLoop function:
bool loopControl = true;
int main()
{
io_service io_service;
Input::create( io_service );
// This loops continually until the server is commanded to shut down
while( loopControl )
{
io_service.run(); // Handles async input
mainLoop(); // Where my actual program resides
}
}
Even if everything else worked, control still won't ever reach mainLoop() under normal circumstances. In other words, io_service.run() is still blocking, defeating the entire purpose. This obviously isn't the correct way to implement io_service and/or mainLoop(); so what is?
My apologies if this has been done thousands of times, but apparently I'm not Googling the right phrases to bring up the results I'm looking for.

boost::asio::buffer does not directly support creating a mutable-buffer from an std::string, mainly because they are not guaranteed to be continuous in memory pre-C++11.
The way you are call it ((void*, size_t) overload), you will let the read overwrite the internals of std::string, which leads to your segfault. You should probably use one of the other overloads in this list: http://www.boost.org/doc/libs/1_50_0/doc/html/boost_asio/reference/buffer.html - most likely one for std::vector<char>, since you can easily copy that into a string when your read returns.
Now that problem is that you need to know beforehand how many chars you want to read, since your strings are of variable length. For that, you need to async_read the length separately before your read the actual contents. Then you resize the buffer (as I said, most likely std::vector<char>) and schedule a read of that length. Note that the sender can send both together, this is only complicated for reading from a stream... To summerize:
async_read your string length into some integer of fixed length
Resize the buffer for the content read appropriately
async_read your contents
As for your second question, it is not really clear what you want, but you might want to look into io_service::poll() if you want to do your own stuff while asio is running.

boost::asio::buffer( &_command, sizeof(_command) ) means that you want to overwrite 4 first bytes (or whatever sizeof(string) is) of _command object, but this is obviously not what you want. If you need an auto-resizing input buffer, use asio::streambuf instead.
io_service::run blocks the calling thread, so your mainLoop won't run. You can either execute io_service::run in a separate thread, or poll io_serivce manually, interleaving calls to run_one/poll_one (see the reference) with iterations of your own application loop.

Output stream locking for multiprocess synchronisation?

Having multiple processes, all writing on the same output stream (e.g. with std::cout), is there a way to lock the stream so that, when a process starts writing his own message, it can do it till the end (e.g. with std::endl)?
I need a portable way of doing it.

It's not clear if it would fit the parameters of your situation, but you could potentially funnel all data to a separate worker process that aggregates the data (with its own internal locking) before dumping them to stdout.

You are out of luck. You will have to use whatever your taget OS provides. This means using global/system-wide mutexes or lockf() like functions. You could use some 3rd party library to satisfy the portability requirement, like Boost.Interprocess.

If you are on a UNIX like OS, then you may be able to mimic the behavior you want with a stringstream adapter. This may not be the best way to accomplish it, but the idea is to trigger a single write call whenever std::endl is encountered.
// Assume fd is in blocking mode
class fdostream : public std::ostringstream {
typedef std::ostream & (*manip_t) (std::ostream &);
struct fdbuf : public std::stringbuf {
int fd_;
fdbuf (int fd) : fd_(fd) {}
int sync () {
int r = ::write(fd_, str().data(), str().size());
str(std::string());
return (r > 0) ? 0 : -1;
}
} buf_;
std::ostream & os () { return *this; }
public:
fdostream (int fd) : buf_(fd) { os().rdbuf(&buf_); }
};
fdostream my_cout(1);
my_cout << "Hello," << " world!" << std::endl;
This should achieve the effect of synchronized writes, at the cost of buffering input into a stringstream and then clearing the internal string after each flush.
For greater portability, you could modify the code to use fwrite, and specify unbuffered writes with setvbuf. But, the atomicity of fwrite would depend on the C implementation of the library function.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Asynchronously writing to a file in c++ unix - c++

Related

`boost::asio::io_context` with `boost::process::async_pipe`: is there a way to run it reliably?

Is there a way to wait for a file to be written to in pure c++? [duplicate]

How to open file in exclusive mode in C++

Boost ASIO - How to write a console server 2

Output stream locking for multiprocess synchronisation?

Categories

Resources