I have an std::istream to work with. Is it possible to somehow pass it on to multiple readers which will potentially seek to and read from different positions?
If not, what if I restrict it to the case of an std::ifstream?
You already answered your question. If it is filestream (ifstream) you get random access (read only; you can set open mode), there should be no problem with multiple threads accessing the same file by opening multiples ifstreams each for one thread. The C++ standard said nothing about thread-safeness about ifstream. For the generic istream (socket, cin), if you use the get() method you will be consuming input stream. I don't see any document for thread-safe of istream. the peek() method will not consume the input stream but will still change the internal state of the istream. If multiple threads doing seek() on the same istream, the behavior is undefined. You are not assured of an internal lock by the C++ language. The seek() is basically dereferencing some sort of pointer to an internal buffer.
I would suggest that you have one thread reading the istream into some buffer (constructed objects (the producer), or simple raw memory), then, multiple threads can consume the result (consumer). This is typical consumer/producer synchronization; any multi-threading text book will teach you how to do it.
Related
I have multiple threads, and I want each of them to process a part of my file. Can I have a single ifstream object for that and make them read concurrently read different parts ? The parts are non overlapping, so the same line will not be processed by two threads. If yes, how to get multiple cursors ?
A single std::ifstream is associated with exactly one cursor (there's a seekg and tellg method associated with the std::ifstream directly).
If you want the same std::ifstream object to be shared accross multiple threads, you'll have to have some sort of synchronization mechanism between the threads, which might defeat the purpose (in each thread, you'll have to lock, seek, read and unlock each time).
To solve your problem, you can open one std::ifstream to the same file per thread. In each thread, you'd seek to whatever position you want to start reading from. This would only require you to be able to "easily" compute the seek position for each thread though (Note: this is a pretty strong requirement).
C++ file streams are not guaranteed to be thread safe (see e.g. this answer).
The typical solution is anyway to open separate streams on the same file, each instance comes with their own "cursor". However, you need to ensure shared access, and concurrency becomes platform specific.
For ifstream (i.e. only reading from the file), the concurrency issues are usually tame. Even if someone else modifies the file, both streams might see different content, but you do have some kind of eventual consistency.
Reads and writes are usually not atomic, i.e. you might read only part of a write. Writes might not even execute in the order they are issued (see write combining).
Looking at FILE struct it seems like there is a pointer inside FILE, char* curp, pointing to the current active pointer, which may mean that for each FILE object, you'd have one particular part of the file.
This being in C, I don't know how ifstream works and if it uses FILE object/it is built like a FILE object. Might not help you at all, but I thought it would be interesting to share this little information, and that it could may be help someone.
In C++, I know I can use read or write file using system function like read or write and I can also do that with fstream's help.
Now I'm implementing a disk management which is a component of DBMS. For simplicity I only use disk management to manage the space of a Unix file.
All I know is fstream wrap system function like read or write and provide some buffer.
However I was wondering whether this will affect atomicity and synchronization or not?
My question is which way should I use and why?
No. Particularly not with Unix. A DBM is going to want contiguous files. That means either a unix variant that support them or creating a disk partition.
You're also going to want to handle the buffering; not following the C++ library's buffering.
I could go on but streams are for - - streams of data -- not secure, reliable structured data.
The following information about synchronization and thread safety of 'fstream' can be found from ISO C++ standard.
27.2.3 Thread safety [iostreams.threadsafety]
Concurrent access to a stream object (27.8, 27.9), stream buffer
object (27.6), or C Library stream (27.9.2) by multiple threads may
result in a data race (1.10) unless otherwise specified (27.4). [
Note: Data races result in undefined behavior (1.10). —end note ]
If one thread makes a library call a that writes a value to a stream
and, as a result, another thread reads this value from the stream
through a library call b such that this does not result in a data
race, then a’s write synchronizes with b’s read.
C/C++ file I/O operation are not thread safe by default. So if you are planning to use fstream of open/write/read system call, then you would have to use synchronization mechanism by yourself in your implementation. You may use 'std::mutex' mechanism provided in new C++ standard(.i.e C++11) to synchronize your file I/O.
I was reading the cplusplus.com tutorial on I/O. At the end, it says fstream buffers are synchronized with the file on disc
Explicitly, with manipulators: When certain manipulators are used on
streams, an explicit synchronization takes place. These manipulators
are: flush and endl.
and
Explicitly, with member function sync(): Calling
stream's member function sync(), which takes no parameters, causes an
immediate synchronization. This function returns an int value equal to
-1 if the stream has no associated buffer or in case of failure. Otherwise (if the stream buffer was successfully synchronized) it
returns 0.
in addition to a few other implicit cases ( such as destruction and stream.close() )
What is the difference between calling fstream::flush() and fstream::sync()? endl?
In my code, I've always used flush().
Documentation on std::flush():
Flush stream buffer
Synchronizes the buffer associated with the stream
to its controlled output sequence. This effectively means that all
unwritten characters in the buffer are written to its controlled
output sequence as soon as possible ("flushed").
Documentation on std::streambuf::sync():
Synchronize input buffer with source of characters
It is called to synchronize the stream buffer with the controlled sequence (like the file in the case of file streams). The public member function pubsync calls this protected member function to perform this action.
Forgive me if this is a newbie question; I am a noob.
basic_ostream::flush
This is a non-virtual function which writes uncommited changes to the underlying buffer. In case of error, it sets an error flag in the used stream object. This is because the return value is a reference to the stream itself, to allow chaining.
basic_filebuf::sync
This is a virtual function which writes all pending changes to the underlying file and returns an error code to signal success or failure.
endl
This, when applied to an ostream, writes an '\n' to the stream and then calls flush on that stream.
So, essentially: flush is a more general function for any stream, whereas sync is explicitly bound to a file. flush is non-virtual, whereas sync is virtual. This changes how they can be used via pointers (to base class) in the case of inheritance. Furthermore, they differ in how they report errors.
sync is a member of input streams, all unread characters are cleared from the buffer. flush is a member of output streams and buffered output is passed down to the kernel.
C++ I/O involves a cooperation between a number of classes: stream, buffer, locale and locale::facet-s.
In particular sync and flush are member function that exist in both stream and streambuf, so beware to what documentation you are referring, since they do different things.
On streams flush tells the stream to tell the buffer (note the redirection) to flush its content onto the destination. This makes sure that no "pending write" remains.
std::endl, when applied to thestream with <<, is no more than a
thestream.put('\n'); thestream.flush();
Always on streams, sync tells the stream to tell the buffer to flush the content (for output) and read (for input) as much as it can to refill the buffer.
Note that -in buffers- sync can be also called internally by overflow to handle the "buffer full" (for output) and "buffer empty" (for input) situations.
I thus sense, sync is much more an "internal" function used in stream to buffer communication and buffer implementation (where it is virtual and overridden in different buffer types), while flush is much more an interface between the stream and the client program.
endl ... is just a shortcut.
I've understood it to be as follows:
flush() will get the data out of the library buffers into the OS's write buffers and will eventually result in a full synchronization (the data is fully written out), but it's definitely up to the OS when the sync will be complete.
sync() will, to the extent possible in a given OS, attempt to force full synchronization to come about — but the OS involved may or may not facilitate this.
So flush() is: get the data out of the buffer and in line to be written.
sync() is: if possible, force the data to be definitively written out, now.
That's been my understanding of this, but as I think about it, I can't remember how I came to this understanding, so I'm curious to hear from others, too.
What is the difference between calling fstream::flush() and fstream::sync()?
There is none: Both essentially call rdbuf()->pubsync() which then calls std::streambuf::sync(), see links at https://en.cppreference.com/w/cpp/io/basic_fstream
After constructing and checking the sentry object, calls rdbuf()->pubsync()
and https://en.cppreference.com/w/cpp/io/basic_streambuf/pubsync
Calls sync() of the most derived class
The only difference is where the functions are defined: sync is inherited from istream while flush is inherited from ostream (and fstream inherits from both). And of course the return values are different: sync returns an int (0 for ok, -1 for failure) while flush returns a reference to the stream object. But you likely don't care about those anyway.
The naming difference for input and output streams is that for input it "synchronizes" the internal buffer with the input stream (here a file) in case that changed or "flushes" pending changes from the internal buffer to the output stream (again: here a file). I.e. "sync from" and "flush to" made more sense naming wise. But for an iostream
And for completeness (almost) from Emilios answer:
std::endl, when applied to thestream with <<, is no more than a
thestream.put('\n').flush();
So it appends a newline and then calls the streams flush function which then eventually calls the buffers sync function (through pubsync).
Just a shortcut to basically use line buffering, i.e. write (up to) the end of that newline, then flush what was written.
I am trying to learn C++ fstream, ifstream, ofstream. On Half way through my project I learnt that if we are accessing the same file with ofstream and ifstream for read & write, its better to close the streams, before using another.
Like
ofstream write_stream(...);
ifstream read_stream(....);
// Accessing pointers using both read_stream, write_stream interchangebly
read_stream.read(....);
write_stream.write(....);
read_stream.close();
write_stream.close();
///
In the above case, I guess both the stream use the same pointer in the file, so I need to be aware of the pointer movement, so I have to seek each and every time I try to read() or write().
I guess, I am right, so far.
To avoid any more confusion, I have decided to use this format
fstream read_write_stream("File.bin",ios::in|ios::out|ios::binary);
read_write_stream.seekp(pos);
read_write_stream.seekg(pos);
read_write_stream.tellp();
read_write_stream.tellg()
read_write_stream.read(...);
read_write_stream.write(...);
read_write_stream.close()
Is there any bug inducing feature, that I should be aware of in the above program??? Please advice
Though I don't know if the standard explicitly refers to this case, I don't think that a C++ compiler can promise you what will happen in the case where you use several streams to change the same external resource (file in this case). In addition to the fstream internal implementation, it depends on the OS and the hardware you're writing to. If I'm correct, it makes this operations, by default, undefined behavior.
If you use two different streams, most likely each will manage its own put and get pointer, each will have buffering which means that if you don't use flush(), you won't be able to determine what is the order the operations will be done on the file.
If you use one stream to both read and write, I think the behavior will be predictable and more trivial understand.
I have threads in my program and I want to put character into stream and read it in another thread, but after std::cin.putback() I need to write something from keyboard to "wake up" std::cin in function main. Can I do something to read automatically?
That's not how streams work. The std::cin reads data that come from outside your program to standard input and the putback only allows keeping a character that you actually just read back to the buffer for re-parsing next time you invoke operator>> (or get or getline or other read method).
If you want to communicate between threads, you should use a message queue from some threading library, e.g. Boost provides a decent portable one.
It is not possible to use streams, at least those provided by standard library, because stringstream is not thread-safe and fistream/fostream can't be created from raw file handle, so you can't combine them with POSIX pipe function. It would be possible to wrap a message queue in a stream (and boost gives you enough tools to do it), but the raw message queue API will probably be suitable.