Why pread does not guarantee that it reads all the specified bytes? - c++

I program in C++ and found that pread works very interestingly.
pread() returns the number of bytes read. The returned number of bytes read can be different from the specified number of bytes intended to be read.
Why pread does not guarantee that it reads all the specified bytes?
Where does this limitation come from?

Why pread does not guarantee that it reads all the specified bytes?
Because it is designed like that.
As it's mentioned here:
Note that is not an error for a successful call to transfer fewer
bytes than requested (see read(2) and write(2)).
So you simply going to call that function again in such case.

this may happen for example because fewer bytes are actually available
right now (maybe because we were close to end-of-file, or because we
are reading from a pipe, or from a terminal), or because read() was
interrupted by a signal. On error, -1 is returned, and errno is set
appropriately. In this case it is left unspecified whether the file
position (if any) changes.
from https://linux.die.net/man/2/read

Related

What will happen if the given offset in fseek goes beyond the last character

I'm currently using c++ and trying to write a file using fseek() in order to write on the given offset calculated from other methods. Just wondering what will happen if the given offset will make the FILE pointer go beyond the last character in the file.
Example:
In a file with "abcdefg" as the contents, what will fseek(someFILEpointer, 20, SEEK_SET) return?
From cppreference:
POSIX allows seeking beyond the existing end of file. If an output is performed after this seek, any read from the gap will return zero bytes. Where supported by the filesystem, this creates a sparse file.
It sounds like it should return a non-error status, but subsequent reads may fail. Subsequent writes may succeed, but the exact behavior may depend on the underlying filesystem.
The C standard leaves it implementation-defined whether such a call to fseek succeeds or not. If the file position cannot be set in the manner indicated, fseek will return an error indication.
From the C standard:
A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END. (§7.21.9.2/3)
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
So in neither case are you guaranteed to be able to call fseek with a non-zero offset and whence set to SEEK_END.
Posix does allow the call (quotes from the description of fseek):
The fseek() function shall allow the file-position indicator to be set beyond the end of existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.
(Posix leaves it up to the implementation whether the bytes with value 0 are actually stored, or are implicit. Most Unix file systems implement sparse files which can optimize this case by not storing the zeros on persistent storage, but this is not possible on a FAT filesystem, for example.)
Even Posix only makes this guarantee for regular files:
The behavior of fseek() on devices which are incapable of seeking is implementation-defined. The value of the file offset associated with such a device is undefined.
So the call may fail, but that is not undefined behaviour. If the repositioning is not possible, fseek will return a nonzero value; in the case of Posix implementations, the nonzero value will be -1 and errno will be set to a value which might help clarify the cause of the failure.
In linux (and unix in general), it would succed and return the new offset measured from the beginning of the file, but the file won't increase in size until you write something at that offset.
Your unwritten part will be read back as zeros from the file, but depending on OS and file system, some of the zeros might not have to occupy space on the harddrive.

Is reading from an anonymous pipe atomic, in the sense of atomic content?

I am writing a process on Linux with two threads. They communicate using an anonymous pipe, created with the pipe() call.
One end is copying a C structure into the pipe:
struct EventStruct e;
[...]
ssize_t n = write(pipefd[1], &e, sizeof(e));
The other end reads it from the pipe:
struct EventStruct e;
ssize_t n = read(pipefd[0], &e, sizeof(e));
if(n != -1 && n != 0 && n < sizeof(e))
{
// Is a partial read possible here??
}
Can partial reads occur with the anonymous pipe?
The man page (man 7 pipe) stipulates that any write under PIPE_BUF size is atomic. But what they mean is atomic regarding other writers threads... I am not concerned with multiple writers issues. I have only one writer thread, and only one reader thread.
As a side note, my structure is 56 bytes long. Well below the PIPE_BUF size, which is at least 4096 bytes on Linux. It looks like it's even higher on most recent kernel.
Told otherwise: on the reading end, do I have to deal with partial read and store them meanwhile I receive a full structure instance?
As long as you are dealing with fixed size units, there isn't a problem. If you write a unit of N bytes on the pipe and the reader requests a unit of N bytes from the pipe, then there will be no issue. If you can't read all the data in one fell swoop (you don't know the size until after you've read its length, for example), then life gets trickier. However, as shown, you should be fine.
That said, you should still detect short reads. There's a catastrophe pending if you get a short read but assume it is full length. However, you should not expect to detect short reads — code coverage will be a problem. I'd simply test n < (ssize_t)sizeof(e) and anything detected is an error or EOF. Note the cast; otherwise, the signed value will be converted to unsigned and -1 won't be spotted properly.
For specification, you'll need to read the POSIX specifications for:
read()
write()
pipe()
and possibly trace links from those pages. For example, for write(), the specification says:
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
There is no file offset associated with a pipe, hence each write request shall append to the end of the pipe.
Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.
Or from the specification of read():
Upon successful completion, where nbyte is greater than 0, read() shall mark for update the last data access timestamp of the file, and shall return the number of bytes read. This number shall never be greater than nbyte. The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data.
So, the write() will write atomic units; the read() will only read atomic units because that's what was written. There won't be a problem, which is what I said at the start.

When std::fprintf(stderr,...) fails, does the operation has no effect or can write a little before fail?

I have a C++11 program that uses std::fprintf to write to stderr as a log and debug info. I know fprintf can fail and return a negative value, but I can't found if the operation is atomic (if fails, has no effect) or it can write some part of the text and then fails (or any other side effect).
The function that uses fprintf looks like this:
void writeToConsole (std::string const &message)
{
std::fprintf(stderr, "%s\n", message.c_str());
}
I am developing using Clang and GCC on Linux (for now), but my question is more about the standard, so...
Question:
If std::fprintf fails, is still possible that some characters had been written to stderr? Is this behaviour a C/C++ standard or is implementation-defined?
Even more, if std::fprintf fails, should I abort the program or can continue execution silently without side effects (other than impossibility of write to stderr)?
Keep in mind that the printf family of functions (almost always) eventually turns into a write(2) function call (or other such low-level, OS/implementation-provided equivalent). This function can be partially successful. If at least one byte is written, the function succeeds (if no error from the underlying destination can be detected - for example, interruption by a signal handler) and it will return the number of bytes actually written:
The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes. [...]
For information about partial writes, see write(2), for example at http://man7.org/linux/man-pages/man2/write.2.html. Values for errno or other effects of a partial write may be dependent on the output medium (or what the file descriptor represents - memory mapped file, regular file, etc.), as well as the specific reason for failure. For example, ENOMEM, EIO, EDQUOT are all possibilities.
Also see the linked man page for additional information about atomicity with regard to multiple threads.
Your other question:
Even more, if std::printf fails, should I abort the program or can continue execution silently without side effects (other than impossibility of write to stderr)?
This really depends on your program.
For fprintf the C++11 standard falls back to C99 since it is part of the C standard library and the C99 draft standard says the following:
The fprintf function returns the number of characters transmitted, or
a negative value if an output or encoding error occurred.
but does not actually specify whether an error means no character were transmitted or not, so that will end up dependent on the implementation.
For POSIX compliant systems, which in this case should cover Linux, the reference for fprintf says:
Upon successful completion, the fprintf() and printf() functions shall return the number of bytes transmitted.
[...]
If an output error was encountered, these functions shall return a negative value.
There are several errors listed that could lead to partial output such as:
[ENOMEM]
Insufficient storage space is available.
Whether an error indicates that you should exit your application depends on your application, do you have an alternate logging mechanism other than stderr? Does your application have legal requirements that mandate everything is logged or are the logs purely informational, etc...

C++ how to flush std:stringbuf?

I need to put the standard output of a process (binary data) to a string buffer and consume it in another thread.
Here is the producer:
while (ReadFile(ffmpeg_OUT_Rd, cbBuffer, sizeof(cbBuffer), &byteRead, NULL)){
tByte += byteRead; //total bytes
sb->sputn(cbBuffer, byteRead);
}
m_bIsFinished = true;
printf("%d bytes are generated.\n", tByte);
Here is the consumer:
while (!PCS_DISPATCHER_INSTANCE->IsFinished())
Sleep(200);
Sleep(5000);
Mystringbuf* sb = PCS_DISPATCHER_INSTANCE->sb;
printf("Avail: %d\n", sb->in_avail());
It turns out that the consumer cannot get all the bytes of the produced by the producer.
( tByte <> sb->in_avail() )
Is it a kind of internal buffering problem? If yes, how to force the stringbuf to flush its internal buffer?
A streambufhas nothing like flush: writes are done directly into the buffer. There is a pubsync() member that could help, if you would use an object derived such as a filebuf. But this does not apply to your case.
Your issue certainly comes from a a data race on sputn() or is_avail(). Either protect access to the streambuf via a mutex, or via an atomic. If m_bIsFinished is not an atomic, and depending on your implementation of isFinished(), the synchronisation between the threads might not be guaranteed (for example: producer could write to memory, but consumer still obtains an outdated value from the CPU memory cache), which could conduct to such a data race.
Edit:
If you'd experience the issue within a single thread, thus eliminating any potential racing condition, it may come from implementation of streambuf. I could experience this with MSVC13 in a single thread application:
tracing showed that number of bytes read were accurate, but in_avail() result was always smaller or equal to tByte through the whole loop.
when reading the streambuf, the correct total number of bytes were read (thus more than indicated by in_avail()).
This behaviour is compliant. According to C++ standard: in_avail() shall return egptr() - gptr() if a read position is available, and otherwhise showmanyc(). The latter is defined as returning an estimate of the number of characters available in the sequence. The only guarantee given is that you could read at least in_avail() bytes without encountering eof.
Workaround use sb->pubseekoff(0,ios::end)- sb->pubseekoff(0,ios::beg); to count the number of bytes available, and make sure you're repositioned at sb->pubseekoff(0,ios::beg) beofre you read your streambuf.

Are there cases where fseek/ftell can give the wrong file size?

In C or C++, the following can be used to return a file size:
const unsigned long long at_beg = (unsigned long long) ftell(filePtr);
fseek(filePtr, 0, SEEK_END);
const unsigned long long at_end = (unsigned long long) ftell(filePtr);
const unsigned long long length_in_bytes = at_end - at_beg;
fprintf(stdout, "file size: %llu\n", length_in_bytes);
Are there development environments, compilers, or OSes which can return the wrong file size from this code, based on padding or other information that is situation-specific? Were there changes in the C or C++ specification around 1999, which would have lead to this code no longer working in certain cases?
For this question, please assume I am adding large file support by compiling with the flags -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1. Thanks.
It won't work on unseekable files like /proc/cpuinfo or /dev/stdin or /dev/tty, or pipe files gotten with popen
And it won't work if that file is written by another process at the same time.
Using the Posix stat function is probably more efficient and more reliable. Of course, this function might not be available on non Posix systems.
The fseek and ftell functions are both defined by the ISO C language standard.
The following is from latest public draft of the 2011 C standard, but the 1990, 1999, and 2011 ISO C standards are all very similar in this area, if not identical.
7.21.9.4:
The ftell function obtains the current value of the file position
indicator for the stream pointed to by stream. For a binary stream,
the value is the number of characters from the beginning of the file.
For a text stream, its file position indicator contains unspecified
information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time of the
ftell call; the difference between two such return values is not
necessarily a meaningful measure of the number of characters written
or read.
7.21.9.2:
The fseek function sets the file position indicator for the stream
pointed to by stream. If a read or write error occurs, the error
indicator for the stream is set and fseek fails.
For a binary stream, the new position, measured in characters from the
beginning of the file, is obtained by adding offset to the
position specified by whence. The specified position is the
beginning of the file if whence is SEEK_SET, the current value
of the file position indicator if SEEK_CUR, or end-of-file if
SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
For a text stream, either offset shall be zero, or offset
shall be a value returned by an earlier successful call to the
ftell function on a stream associated with the same file and whence shall be SEEK_SET.
Violating any of the "shall" clauses makes your program's behavior undefined.
So if the file was opened in binary mode, ftell gives you the number of characters from the beginning of the file -- but an fseek relative to the end of the file (SEEK_END) is not necessarily meaningful. This accommodates systems that store binary files in whole blocks and don't keep track of how much was written to the final block.
If the file was opened in text mode, you can seek to the beginning or end of the file with an offset of 0, or you can seek to a position given by an earlier call to ftell; fseek with any other arguments has undefined behavior. This accomodates systems where the number of characters read from a text file doesn't necessarily correspond to the number of bytes in the file. For example, on Windows reading a CR-LF pair ("\r\n") reads only one character, but advances 2 bytes in the file.
In practice, on Unix-like systems text and binary modes behave the same way, and the fseek/ftell method will work. I suspect it will work on Windows (my guess is that ftell will give the byte offset, which may not be the same as the number of times you could call getchar() in text mode).
Note also that ftell() returns a result of type long. On systems where long is 32 bits, this method can't work for files that are 2 GiB or larger.
You might be better off using some system-specific method to get the size of a file. Since the fseek/ftell method is system-specific anyway, such as stat() on Unix-like systems.
On the other hand, fseek and ftell are likely to work as you expect on most systems you're likely to encounter. I'm sure there are systems where it won't work; sorry, but I don't have specifics.
If working on Linux and Windows is good enough, and you're not concerned with large files, then the fseek/ftell method is probably ok. Otherwise, you should consider using a system-specific method to determine the size of a file.
And keep in mind that anything that tells you the size of a file can only tell you its size at that moment. The file's size could change before you access it.
1) Superficially, your code looks "OK" - I don't see any problem with it.
2) No - there isn't any "C or C++ specification" that would affect fseek. There is a Posix specification:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
3) If you want "file size", my first choice would probably by "stat()". Here's the Posix specification:
http://pubs.opengroup.org/onlinepubs/007904975/functions/stat.html
4) If something's "going wrong" with your method, then my first guess would be "large file support".
For example, many OS's had parallel "fseek()" and "fseek64()" APIs.
'Hope that helps .. PSM
POSIX defines the return value from fseek as "measured in bytes from the beginning of the file". Your at_beg will always be zero (assuming this is a newly opened file).
So, assuming that:
the file is seekable
there are no concurrency issues to be concerned about
the file size is representable in the data type used by the fseek/ftell variant you choose
then your code should work on any POSIX-compliant system.