C++ how to flush std:stringbuf? - c++

I need to put the standard output of a process (binary data) to a string buffer and consume it in another thread.
Here is the producer:
while (ReadFile(ffmpeg_OUT_Rd, cbBuffer, sizeof(cbBuffer), &byteRead, NULL)){
tByte += byteRead; //total bytes
sb->sputn(cbBuffer, byteRead);
}
m_bIsFinished = true;
printf("%d bytes are generated.\n", tByte);
Here is the consumer:
while (!PCS_DISPATCHER_INSTANCE->IsFinished())
Sleep(200);
Sleep(5000);
Mystringbuf* sb = PCS_DISPATCHER_INSTANCE->sb;
printf("Avail: %d\n", sb->in_avail());
It turns out that the consumer cannot get all the bytes of the produced by the producer.
( tByte <> sb->in_avail() )
Is it a kind of internal buffering problem? If yes, how to force the stringbuf to flush its internal buffer?

A streambufhas nothing like flush: writes are done directly into the buffer. There is a pubsync() member that could help, if you would use an object derived such as a filebuf. But this does not apply to your case.
Your issue certainly comes from a a data race on sputn() or is_avail(). Either protect access to the streambuf via a mutex, or via an atomic. If m_bIsFinished is not an atomic, and depending on your implementation of isFinished(), the synchronisation between the threads might not be guaranteed (for example: producer could write to memory, but consumer still obtains an outdated value from the CPU memory cache), which could conduct to such a data race.
Edit:
If you'd experience the issue within a single thread, thus eliminating any potential racing condition, it may come from implementation of streambuf. I could experience this with MSVC13 in a single thread application:
tracing showed that number of bytes read were accurate, but in_avail() result was always smaller or equal to tByte through the whole loop.
when reading the streambuf, the correct total number of bytes were read (thus more than indicated by in_avail()).
This behaviour is compliant. According to C++ standard: in_avail() shall return egptr() - gptr() if a read position is available, and otherwhise showmanyc(). The latter is defined as returning an estimate of the number of characters available in the sequence. The only guarantee given is that you could read at least in_avail() bytes without encountering eof.
Workaround use sb->pubseekoff(0,ios::end)- sb->pubseekoff(0,ios::beg); to count the number of bytes available, and make sure you're repositioned at sb->pubseekoff(0,ios::beg) beofre you read your streambuf.

Related

Why pread does not guarantee that it reads all the specified bytes?

I program in C++ and found that pread works very interestingly.
pread() returns the number of bytes read. The returned number of bytes read can be different from the specified number of bytes intended to be read.
Why pread does not guarantee that it reads all the specified bytes?
Where does this limitation come from?
Why pread does not guarantee that it reads all the specified bytes?
Because it is designed like that.
As it's mentioned here:
Note that is not an error for a successful call to transfer fewer
bytes than requested (see read(2) and write(2)).
So you simply going to call that function again in such case.
this may happen for example because fewer bytes are actually available
right now (maybe because we were close to end-of-file, or because we
are reading from a pipe, or from a terminal), or because read() was
interrupted by a signal. On error, -1 is returned, and errno is set
appropriately. In this case it is left unspecified whether the file
position (if any) changes.
from https://linux.die.net/man/2/read

Is reading from an anonymous pipe atomic, in the sense of atomic content?

I am writing a process on Linux with two threads. They communicate using an anonymous pipe, created with the pipe() call.
One end is copying a C structure into the pipe:
struct EventStruct e;
[...]
ssize_t n = write(pipefd[1], &e, sizeof(e));
The other end reads it from the pipe:
struct EventStruct e;
ssize_t n = read(pipefd[0], &e, sizeof(e));
if(n != -1 && n != 0 && n < sizeof(e))
{
// Is a partial read possible here??
}
Can partial reads occur with the anonymous pipe?
The man page (man 7 pipe) stipulates that any write under PIPE_BUF size is atomic. But what they mean is atomic regarding other writers threads... I am not concerned with multiple writers issues. I have only one writer thread, and only one reader thread.
As a side note, my structure is 56 bytes long. Well below the PIPE_BUF size, which is at least 4096 bytes on Linux. It looks like it's even higher on most recent kernel.
Told otherwise: on the reading end, do I have to deal with partial read and store them meanwhile I receive a full structure instance?
As long as you are dealing with fixed size units, there isn't a problem. If you write a unit of N bytes on the pipe and the reader requests a unit of N bytes from the pipe, then there will be no issue. If you can't read all the data in one fell swoop (you don't know the size until after you've read its length, for example), then life gets trickier. However, as shown, you should be fine.
That said, you should still detect short reads. There's a catastrophe pending if you get a short read but assume it is full length. However, you should not expect to detect short reads — code coverage will be a problem. I'd simply test n < (ssize_t)sizeof(e) and anything detected is an error or EOF. Note the cast; otherwise, the signed value will be converted to unsigned and -1 won't be spotted properly.
For specification, you'll need to read the POSIX specifications for:
read()
write()
pipe()
and possibly trace links from those pages. For example, for write(), the specification says:
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
There is no file offset associated with a pipe, hence each write request shall append to the end of the pipe.
Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.
Or from the specification of read():
Upon successful completion, where nbyte is greater than 0, read() shall mark for update the last data access timestamp of the file, and shall return the number of bytes read. This number shall never be greater than nbyte. The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data.
So, the write() will write atomic units; the read() will only read atomic units because that's what was written. There won't be a problem, which is what I said at the start.

What does `POLLOUT` event in `poll` Linux function mean?

From Linux documentation, POLLOUT means Normal data may be written without blocking. Well, but this explanation is ambigous.
How much data is it possible to write without blocking after poll reported this event? 1 byte? 2 bytes? Gigabyte?
After POLLOUT event on blocking socket, how to check how much data I can send to socket without block?
poll system call only tells you that there is something happen in the file descriptor(physical device) but it doesn't tell you how much space is available for you to read or write. In order to know exactly how many bytes data is available to be used for reading or writing, you must use read() or write() system call to get the return value which says the number of bytes you have actually been read or written.
Thus,poll() is mainly used for applications that must use multiple input or output streams without getting stuck on any one of them. You can't use write() or read() in this case since you can't monitor multiple descriptors at the same time within one thread.
BTW,for device driver,the underlying implementation for POLL in driver usually likes this(code from ldd3):
static unsigned int scull_p_poll(struct file *filp, poll_table *wait)
{
poll_wait(filp, &dev->inq, wait);
poll_wait(filp, &dev->outq, wait);
...........
if (spacefree(dev))
mask |= POLLOUT | POLLWRNORM; /* writable */
up(&dev->sem);
return mask;
}
If poll() sets the POLLOUT flag then at least one byte may be written without blocking. You may then find that a write() operation performs only a partial write, so indicated by returning a short count. You must always be prepared for partial reads and writes when multiplexing I/O via poll() and/or select().

Qt QIODevice::write / QTcpSocket::write and bytes written

We are quite confused about the behavior of QIODevice::write in general and the QTcpSocket implementation specifically. There is a similar question already, but the answer is not really satisfactory. The main confusion stems from the there mentioned bytesWritten signal respectively the waitForBytesWritten method. Those two seem to indicate the bytes that were written from the buffer employed by the QIODevice to the actual underlying device (there must be such buffer, otherwise the method would not make much sense). The question then is though, if the number returned by QIODevice::write corresponds with this number, or if in that case it indicates the number of bytes that were stored in the internal buffer, not the bytes written to the underlying device. If the number returned would indicate the bytes written to the internal buffer, we would need to employ a pattern like the following to ensure all our data is written:
void writeAll(QIODevice& device, const QByteArray& data) {
int written = 0;
do {
written = device.write(data.constData() + written, data.size() - written);
} while(written < data.size());
}
However, this will insert duplicate data if the return value of QIODevice::write corresponds with the meaning of the bytesWritten signal. The documentation is very confusing about this, as in both methods the word device is used, even though it seems logical and the general understanding, that one actually indicates written to buffer, and not device.
So to summarize, the question is: Is the number returned bye QIODevice::write the number of bytes written to the underlying device, and hence its save to call QIODevice::write without checking the returned number of bytes, as everything is stored in the internal buffer. Or does it indicate how much bytes it could store internally and a pattern like the above writeAll has to be employed to safely write all data to the device?
(UPDATE: Looking at the source, the QTcpSocket::write implementation actually will never return less bytes than one wanted to write, so the writeAll above is not needed. However, that is specific to the socket and this Qt version, the documentation is still confusing...)
QTcpSocket is a buffered QAbstractSocket. An internal buffer is allocated inside QAbstractSocket, and data is copied in that buffer. The return value of write is the size of the data passed to write().
waitForBytesWritten waits until the data in the internal buffer of QAbstractSocket is written to the native socket.
That previous question answers your question, as does the QIODevice::write(const char * data, qint64 maxSize) documentation:
Writes at most maxSize bytes of data from data to the device. Returns the number of bytes that were actually written, or -1 if an error occurred.
This can (and will in real life) return less than what you requested, and it's up to you to call write again with the remainder.
As for waitForBytesWritten:
For buffered devices, this function waits until a payload of buffered written data has been written to the device...
It applies only to buffered devices. Not all devices are buffered. If they are, and you wrote less than what the buffer can hold, write can return successfully before the device has finished sending all the data.
Devices are not necessarily buffered.

C++ reading buffer size

Suppose that this file is 2 and 1/2 blocks long, with block size of 1024.
aBlock = 1024;
char* buffer = new char[aBlock];
while (!myFile.eof()) {
myFile.read(buffer,aBlock);
//do more stuff
}
The third time it reads, it is going to write half of the buffer, leaving the other half with invalid data. Is there a way to know how many bytes did it actually write to the buffer?
istream::gcount returns the number of bytes read by the previous read.
Your code is both overly complicated and error-prone.
Reading in a loop and checking only for eof is a logic error since this will result in an infinite loop if there is an error while reading (for whatever reason).
Instead, you need to check all fail states of the stream, which can be done by simply checking for the istream object itself.
Since this is already returned by the read function, you can (and, indeed, should) structure any reader loop like this:
while (myFile.read(buffer, aBlock))
process(buffer, aBlock);
process(buffer, myFile.gcount());
This is at the same time shorter, doesn’t hide bugs and is more readable since the check-stream-state-in-loop is an established C++ idiom.
You could also look at istream::readsome, which actually returns the amount of bytes read.