Is std::iostream non-blocking?

Is std::iostream non-blocking? - c++

According to the boost reference for Boost.Iostreams (In section 3.6, at the very bottom):
http://www.boost.org/doc/libs/1_64_0/libs/iostreams/doc/index.html
Although the Boost.Iostreams Filter and Device concepts can
accommodate non-blocking i/o, the C++ standard library stream and
stream buffer interfaces cannot, since they lack a means to
distinguish between temporary and permanent failures to satisfy a read
or write request
However, the function std::istream::readsome appears to be non-blocking, in that the available characters will be immediately returned, without a blocking (except for a RAM copy) wait. My understanding is that:
std::istream::read will block until eof or number of characters read.
std::istream::readsome will return immediately with characters copied from the internal buffer.

I agree with you that readsome is not a blocking operation. However, as specified, it is wholly inadequate as an interface for performing what is usually called "non-blocking I/O".
First, there is no guarantee that readsome will ever return new data, even if it is available. So to guarantee you actually make progress, you must use one of the blocking interfaces eventually.
Second, there is no way to know when readsome will return data. There is no way to "poll" the stream, or to get a "notification" or "event" or "callback". A usable non-blocking interface needs at least one of these.
In short, readsome appears to be a half-baked and under-specified attempt to provide a non-blocking interface to I/O streams. But I have never seen it used in production code, and I would not expect to.
I think the Boost documentation overstates the argument, because as you observe, readsome is certainly capable of distinguishing temporary from permanent failure. But their conclusion is still correct for the reasons above.

When looking into non-blocking portability, I didn't find anything in the C++ standard library that looked like it did what you think it does.
If your goal is portability, my interpretation was that the section that mattered most was this:
http://en.cppreference.com/w/cpp/io/basic_istream/readsome
For example, when used with std::ifstream, some library
implementations fill the underlying filebuf with data as soon as the
file is opened (and readsome() on such implementations reads data,
potentially, but not necessarily, the entire file), while other
implementations only read from file when an actual input operation is
requested (and readsome() issued after file opening never extracts any
characters).
This says that different implementations that use the iostream interface are allowed to do their work lazily, and readsome() doesn't guarantee that the work even gets kicked off.
However, I think your interpretation that readsome is guaranteed not to block is true.

Related

CUDA default streams and CUDA_API_PER_THREAD_DEFAULT_STREAM

The documentation here tries to explain how default streams are handled.
Given code like this (ignoring allocation errors):
char *ptr;
char source[1000000];
cudaMalloc((void**)&ptr, 1000000);
cudaMemcpyAsync(ptr, source, 1000000, cudaMemcpyHostToDevice);
myKernel<<<1000, 1000>>>(ptr);
Is there a risk that myKernel will start before cudaMemcpyAsync finishes copying? I think "No" because this is a "Legacy default stream" as described in the documentation.
However, if I compile with CUDA_API_PER_THREAD_DEFAULT_STREAM what happens? The text for "Per-thread default stream" says:
The per-thread default stream is an implicit stream local to both the thread and the CUcontext, and which does not synchronize with other streams (just like explcitly created streams). The per-thread default stream is not a non-blocking stream and will synchronize with the legacy default stream if both are used in a program.
I think this might also be OK as both cudaMemcpyAsync and myKernel are effectively using CU_STREAM_PER_THREAD; am I correct?
The reason I ask is that I have a really weird intermittent CUDA error 77 in a kernel that I can only explain by a cudaMemcpyAsync not finishing before calling myKernel, which would mean that I am not understanding the documentation. The real code is too involved and too proprietary to make an MCVE, though.

Is there a risk that myKernel will start before cudaMemcpyAsync
finishes copying? I think "No" because this is a "Legacy default
stream" as described in the documentation.
No that can't happen because, as you note, the legacy default stream (stream 0) is blocking under all circumstances.
However, if I compile with CUDA_API_PER_THREAD_DEFAULT_STREAM what happens?
Almost nothing changes. The per-thread default stream isn't blocking, so other streams and other threads using their default streams could operate concurrently within the context. Both operations are, however, still in the same stream and are sequential with respect to one another. The only way overlap could occur between the two operations would be if source was a non-pageable memory allocation which permitted overlap between the transfer and the kernel execution. Otherwise, they will run sequentially because of the ordering property of the stream and the restrictions imposed by the host source memory.
If you are having a real problem with suspected unexpected overlap of operations, you should be able to confirm this by profiling.

Relying on network I/O to provide cross-thread synchronization in C++

Can external I/O be relied upon as a form of cross-thread synchronization?
To be specific, consider the pseudocode below, which assumes the existence of network/socket functions:
int a; // Globally accessible data.
socket s1, s2; // Platform-specific.
int main() {
// Set up + connect two sockets to (the same) remote machine.
s1 = ...;
s2 = ...;
std::thread t1{thread1}, t2{thread2};
t1.join();
t2.join();
}
void thread1() {
a = 42;
send(s1, "foo");
}
void thread2() {
recv(s2); // Blocking receive (error handling omitted).
f(a); // Use a, should be 42.
}
We assume that the remote machine only sends data to s2 upon receiving the "foo" from s1. If this assumption fails, then certainly undefined behavior will result. But if it holds (and no other external failure occurs like network data corruption, etc.), does this program produce defined behavior?
"Never", "unspecified (depends on implementation)", "depends on the guarantees provided by the implementation of send/recv" are example answers of the sort I'm expecting, preferably with justification from the C++ standard (or other relevant standards, such as POSIX for sockets/networking).
If "never", then changing a to be a std::atomic<int> initialized to a definite value (say 0) would avoid undefined behaviour, but then is the value guaranteed to be read as 42 in thread2 or could a stale value be read? Do POSIX sockets provide a further guarantee that ensures a stale value will not be read?
If "depends", do POSIX sockets provide the relevant guarantee to make it defined behavior? (How about if s1 and s2 were the same socket instead of two separate sockets?)
For reference, the standard I/O library has a clause which seems to provide an analogous guarantee when working with iostreams (27.2.3¶2 in N4604):
If one thread makes a library call a that writes a value to a stream and, as a result, another thread reads this value from the stream through a library call b such that this does not result in a data race, then a’s write synchronizes with b’s read.
So is it a matter of the underlying network library/functions being used providing a similar guarantee?
In practical terms, it seems the compiler can't reorder accesses to the global a with respect to the send andrecv functions (as they could use a in principle). However, the thread running thread2 could still read a stale value of a unless there was some kind of memory barrier / synchronization guarantee provided by the send/recv pair itself.

Short answer: No, there is no generic guarantee that a will be updated. My suggestion would be to send the value of a along with "foo" - e.g. "foo, 42", or something like it. That is guaranteed to work, and probably not that significant overhead. [There may of course be other reasons why that doesn't work well]
Long rambling stuff that doesn't really answer the problem:
Global data is not guaranteed to be "visible" immediately in different cores of multicore processors without further operations. Yes, most modern processors are "coherent", but not all models of all brands are guaranteed to do so. So if thread2 runs on a processor that has already cached a copy of a, it can not be guaranteed that the value of a is 42 at the point when you call f.
The C++ standard guarantees that global variables are loaded after the function call, so the compiler is not allowed to do:
tmp = a;
recv(...);
f(tmp);
but as I said above, cache-operations may be needed to guarantee that all processors see the same value at the same time. If send and recv are long in time or big in accesses enough [there is no direct measure that says how long or big] you may see the correct value most or even all of the time, but there is no guarantee for ordinary types that they are ACTUALLY updated outside of the thread that wrote the value last.
std::atomic will help on some types of processors, but there is no guarantee that this is "visible" in a second thread or on a second processor core at any reasonable time after it was changed.
The only practical solution is to have some kind of "repeat until I see it change" type code - this may require one value that is (for example) a counter, and one value that is the actual value - if you want to be able to say that "a is now 42. I've set a again, it's 42 this time too". If a is reppresenting, for example the number of data items available in a buffer, it is probably "it changed value" that matters, and just checking "is this the same as last time". The std::atomic operations have guarantees with regard to ordering, which allows you to use them to ensure that "if I update this field, the other field is guaranteed to appear at the same time or before this". So you can use that to guarantee for example a pair of data items are set to the "there is a new value" (for example a counter to indicate the "version number" of the current data) and "the new value is X".
Of course, if you KNOW what processor architectures your code will run on, you can plausibly make more advanced guesses as to what the behaviour will be. For example all x86 and many ARM processors use the cache-interface to implement atomic updates on a variable, so by doing an atomic update on one core, you can know that "no other processor will have a stale value of this". But there are processors available that do not have this implementation detail, and where an update, even with an atomic instruction, will not be updated on other cores or in other threads until "some time in the future, uncertain when".

In general, no, external I/O can't be relied upon for cross-thread synchronization.
The question is out-of-scope of the C++ standard itself, as it involves the behavior of external/OS library functions. So whether the program is undefined behavior depends on any synchronization guarantees provided by the network I/O functions. In the absence of such guarantees, it is indeed undefined behavior. Switching to (initialized) atomics to avoid undefined behavior still wouldn't guarantee the "correct" up-to-date value will be read. To ensure that within the realms of the C++ standard would require some kind of locking (e.g. spinlock or mutex), even though it seems like waiting shouldn't be required due to the real-time ordering of the situation.
In general, the notion of "real-time" synchronization (involving visibility rather than merely ordering) required to avoid having to potentially wait after the recv returns before loading a isn't supported by the C++ standard. At a lower level, this notion does exist however, and would typically be implemented through inter-processor interrupts, e.g. FlushProcessWriteBuffers on Windows, or sys_membarrier on x86 Linux. This would be inserted after the store to a before send in thread1. No synchronization or barrier would be required in thread2. (It also seems like a simple SFENCE in thread1 might suffice on x86 due to its strong memory model, at least in the absence of non-temporal loads/stores.)
A compiler barrier shouldn't be needed in either thread for the reasons outlined in the question (call to an external function send, which for all the compiler knows could be acquiring an internal mutex to synchronize with the other call to recv).
Insidious problems of the sort described in section 4.3 of Hans Boehm's paper "Threads Cannot be Implemented as a Library" should not be a concern as the C++ compiler is thread-aware (and in particular the opaque functions send and recv could contain synchronization operations), so transformations introducing writes to a after the send in thread1 are not permissible under the memory model.
This leaves the open question of whether the POSIX network functions provide the necessary guarantees. I highly doubt it, as on some of the architectures with weak memory models, they are highly non-trivial and/or expensive to provide (requiring a process-wide mutex or IPI as mentioned earlier). On x86 specifically, it's almost certain that accessing a shared resource like a socket will entail an SFENCE or MFENCE (or even a LOCK-prefixed instruction) somewhere along the line, which should be sufficient, but this is unlikely to be enshrined in a standard anywhere. Edit: In fact, I think even the INT to switch to kernel mode entails a drain of the store buffer (the best reference I have to hand is this forum post).

Can I use fstream in C++ to read or write file when I'm implementing a disk management component of DBMS

In C++, I know I can use read or write file using system function like read or write and I can also do that with fstream's help.
Now I'm implementing a disk management which is a component of DBMS. For simplicity I only use disk management to manage the space of a Unix file.
All I know is fstream wrap system function like read or write and provide some buffer.
However I was wondering whether this will affect atomicity and synchronization or not?
My question is which way should I use and why?

No. Particularly not with Unix. A DBM is going to want contiguous files. That means either a unix variant that support them or creating a disk partition.
You're also going to want to handle the buffering; not following the C++ library's buffering.
I could go on but streams are for - - streams of data -- not secure, reliable structured data.

The following information about synchronization and thread safety of 'fstream' can be found from ISO C++ standard.
27.2.3 Thread safety [iostreams.threadsafety]
Concurrent access to a stream object (27.8, 27.9), stream buffer
object (27.6), or C Library stream (27.9.2) by multiple threads may
result in a data race (1.10) unless otherwise specified (27.4). [
Note: Data races result in undefined behavior (1.10). —end note ]
If one thread makes a library call a that writes a value to a stream
and, as a result, another thread reads this value from the stream
through a library call b such that this does not result in a data
race, then a’s write synchronizes with b’s read.
C/C++ file I/O operation are not thread safe by default. So if you are planning to use fstream of open/write/read system call, then you would have to use synchronization mechanism by yourself in your implementation. You may use 'std::mutex' mechanism provided in new C++ standard(.i.e C++11) to synchronize your file I/O.

Is it safe for more than one goroutine to print to stdout?

I have multiple goroutines in my program, each of which makes calls to fmt.Println without any explicit synchronization. Is this safe (i.e., will each line appear separately without data corruption), or do I need to create another goroutine with synchronization specifically to handle printing?

No it's not safe even though you may not sometimes observe any troubles. IIRC, the fmt package tries to be on the safe side, so probably intermixing of some sort may occur but no process crash, hopefully.
This is an instance of a more universal Go documentation rule: Things are not safe for concurrent access unless specified otherwise or where obvious from context.
One can have a safe version of a nice subset of fmt.Print* functionality using the log package with some small initial setup.

Everything fmt does falls back to w.Write() as can be seen here. Because there's no locking around it, everything falls back to the implementation of Write(). As there is still no locking (for Stdout at least), there is no guarantee your output will not be mixed.
I'd recommend using a global log routine.
Furthermore, if you simply want to log data, use the log package, which locks access to the output properly.
See the implementation for reference.

The common methods (fmt.printLine) are not safe. However, there are methods that are.
log.Logger is "goroutine safe": https://golang.org/pkg/log/#Logger
Something like this will create a stdout logger that can be used from any go routine safely.
logger := log.New(os.Stdout, "", 0)

Is there any bug inducing feature when reading and writing using the same fstream object in c++

I am trying to learn C++ fstream, ifstream, ofstream. On Half way through my project I learnt that if we are accessing the same file with ofstream and ifstream for read & write, its better to close the streams, before using another.
Like
ofstream write_stream(...);
ifstream read_stream(....);
// Accessing pointers using both read_stream, write_stream interchangebly
read_stream.read(....);
write_stream.write(....);
read_stream.close();
write_stream.close();
///
In the above case, I guess both the stream use the same pointer in the file, so I need to be aware of the pointer movement, so I have to seek each and every time I try to read() or write().
I guess, I am right, so far.
To avoid any more confusion, I have decided to use this format
fstream read_write_stream("File.bin",ios::in|ios::out|ios::binary);
read_write_stream.seekp(pos);
read_write_stream.seekg(pos);
read_write_stream.tellp();
read_write_stream.tellg()
read_write_stream.read(...);
read_write_stream.write(...);
read_write_stream.close()
Is there any bug inducing feature, that I should be aware of in the above program??? Please advice

Though I don't know if the standard explicitly refers to this case, I don't think that a C++ compiler can promise you what will happen in the case where you use several streams to change the same external resource (file in this case). In addition to the fstream internal implementation, it depends on the OS and the hardware you're writing to. If I'm correct, it makes this operations, by default, undefined behavior.
If you use two different streams, most likely each will manage its own put and get pointer, each will have buffering which means that if you don't use flush(), you won't be able to determine what is the order the operations will be done on the file.
If you use one stream to both read and write, I think the behavior will be predictable and more trivial understand.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js