ifstream operator >> and error handling - c++

I want to use ifstream to read data from a named piped. I would like to use its operator>> to read formatted data (typically, an int).
However, I am a bit confused in the way error handling works.
Imagine I want to read an int but only 3 bytes are available. Errors bits would be set, but what will happen to theses 3 bytes ? Will they "disappear", will they be put back into the stream for later extraction ?
Thanks,

As has been pointed out, you can't read binary data over an istream.
But concerning the number of available bytes issue (since you'll
probably want to use basic_ios<char> and streambuf for your binary
streams): istream and ostream use a streambuf for the actual
sourcing and sinking of the bytes. And streambuf normally buffer: the
procedure is: if a byte is in the buffer, return it, otherwise, try to
reload the buffer, waiting until the reloading has finished, or
definitively failed. In case of definitive failure, the streambuf
returns end of file, and that terminates the input; istream will
memorize the end of file, and not attempt any more input. So if the
type you are reading needs four bytes, it will request four bytes from
the streambuf, and will normally wait until those four bytes are
there. No error will be set (because there isn't an error); you will
simply not return from the operator>> until those four bytes arrive.
If you implement your own binary streams, I would strongly recommend
using the same pattern; it will allow direct use of already existing
standard components like std::ios_base and (perhaps) std::filebuf,
and will provide other programmers with an idiom they are familiar with.
If the blocking is a problem, the simplest solution is just to run the
input in a separate thread, communicating via a message queue or
something similar. (Boost has support for asynchronous IO. This avoids
threads, but is globally much more complicated, and doesn't work well
with the classical stream idiom.)

Related

What does `POLLOUT` event in `poll` Linux function mean?

From Linux documentation, POLLOUT means Normal data may be written without blocking. Well, but this explanation is ambigous.
How much data is it possible to write without blocking after poll reported this event? 1 byte? 2 bytes? Gigabyte?
After POLLOUT event on blocking socket, how to check how much data I can send to socket without block?
poll system call only tells you that there is something happen in the file descriptor(physical device) but it doesn't tell you how much space is available for you to read or write. In order to know exactly how many bytes data is available to be used for reading or writing, you must use read() or write() system call to get the return value which says the number of bytes you have actually been read or written.
Thus,poll() is mainly used for applications that must use multiple input or output streams without getting stuck on any one of them. You can't use write() or read() in this case since you can't monitor multiple descriptors at the same time within one thread.
BTW,for device driver,the underlying implementation for POLL in driver usually likes this(code from ldd3):
static unsigned int scull_p_poll(struct file *filp, poll_table *wait)
{
poll_wait(filp, &dev->inq, wait);
poll_wait(filp, &dev->outq, wait);
...........
if (spacefree(dev))
mask |= POLLOUT | POLLWRNORM; /* writable */
up(&dev->sem);
return mask;
}
If poll() sets the POLLOUT flag then at least one byte may be written without blocking. You may then find that a write() operation performs only a partial write, so indicated by returning a short count. You must always be prepared for partial reads and writes when multiplexing I/O via poll() and/or select().

What's the difference between read() and getc()

I have two code segments:
while((n=read(0,buf,BUFFSIZE))>0)
if(write(1,buf,n)!=n)
err_sys("write error");
while((c=getc(stdin))!=EOF)
if(putc(c,stdout)==EOF)
err_sys("write error");
Some sayings on internet make me confused. I know that standard I/O does buffering automatically, but I have passed a buf to read(), so read() is also doing buffering, right? And it seems that getc() read data char by char, how much data will the buffer have before sending all the data out?
Thanks
While both functions can be used to read from a file, they are very different. First of all on many systems read is a lower-level function, and may even be a system call directly into the OS. The read function also isn't standard C or C++, it's part of e.g. POSIX. It also can read arbitrarily sized blocks, not only one byte at a time. There's no buffering (except maybe at the OS/kernel level), and it doesn't differ between "binary" and "text" data. And on POSIX systems, where read is a system call, it can be used to read from all kind of devices and not only files.
The getc function is a higher level function. It usually uses buffered input (so input is read in blocks into a buffer, sometimes by using read, and the getc function gets its characters from that buffer). It also only returns a single characters at a time. It's also part of the C and C++ specifications as part of the standard library. Also, there may be conversions of the data read and the data returned by the function, depending on if the file was opened in text or binary mode.
Another difference is that read is also always a function, while getc might be a preprocessor macro.
Comparing read and getc doesn't really make much sense, more sense would be comparing read with fread.

C++ reading buffer size

Suppose that this file is 2 and 1/2 blocks long, with block size of 1024.
aBlock = 1024;
char* buffer = new char[aBlock];
while (!myFile.eof()) {
myFile.read(buffer,aBlock);
//do more stuff
}
The third time it reads, it is going to write half of the buffer, leaving the other half with invalid data. Is there a way to know how many bytes did it actually write to the buffer?
istream::gcount returns the number of bytes read by the previous read.
Your code is both overly complicated and error-prone.
Reading in a loop and checking only for eof is a logic error since this will result in an infinite loop if there is an error while reading (for whatever reason).
Instead, you need to check all fail states of the stream, which can be done by simply checking for the istream object itself.
Since this is already returned by the read function, you can (and, indeed, should) structure any reader loop like this:
while (myFile.read(buffer, aBlock))
process(buffer, aBlock);
process(buffer, myFile.gcount());
This is at the same time shorter, doesn’t hide bugs and is more readable since the check-stream-state-in-loop is an established C++ idiom.
You could also look at istream::readsome, which actually returns the amount of bytes read.

Marshall multiple protobuf to file

Background:
I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.
[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]
I have an implemntation that works on Github:
src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp
But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,
Questions:
I'm using reinterpret_cast<char*> to read/write the 32 bit integers to and from the binary fstream. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that an int is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binary fstream given these two limiting assumptions?
In reading from fstream, I create a temporary fixed-length char buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode using ParseFromArray, as ParseFromIstream will consume the entire stream. I'd really prefer just to tell the library to read at most the next N bytes from fstream, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of an fstream? Or is my design sufficiently upside down that I should consider a different approach entirely?
Edit:
#codymanix: I'm casting to char since istream::read requires a char array if I'm not mistaken. I'm also not using the extraction operator >> since I read it was poor form to use with binary streams. Or is this last piece of advice bogus?
#Martin York: Removed new/delete in favor of std::vector<char>. glob.cpp is now updated. Thanks!
Don't use new []/delete[].
Instead us a std::vector as deallocation is guaranteed in the event of exceptions.
Don't assume that reading will return all the bytes you requested.
Check with gcount() to make sure that you got what you asked for.
Rather than have Glob implement the code for both input and output depending on a switch in the constructor. Rather implement two specialized classes like ifstream/ofstream. This will simplify both the interface and the usage.

c++ file bad bit

when I run this code, the open and seekg and tellg operation all success.
but when I read it, it fails, the eof,bad,fail bit are 0 1 1.
What can cause a file bad?
thanks
int readriblock(int blockid, char* buffer)
{
ifstream rifile("./ri/reverseindex.bin", ios::in|ios::binary);
rifile.seekg(blockid * RI_BLOCK_SIZE, ios::beg);
if(!rifile.good()){ cout<<"block not exsit"<<endl; return -1;}
cout<<rifile.tellg()<<endl;
rifile.read(buffer, RI_BLOCK_SIZE);
**cout<<rifile.eof()<<rifile.bad()<<rifile.fail()<<endl;**
if(!rifile.good()){ cout<<"error reading block "<<blockid<<endl; return -1;}
rifile.close();
return 0;
}
Quoting the Apache C++ Standard Library User's Guide:
The flag std::ios_base::badbit indicates problems with the underlying stream buffer. These problems could be:
Memory shortage. There is no memory available to create the buffer, or the buffer has size 0 for other reasons (such as being provided from outside the stream), or the stream cannot allocate memory for its own internal data, as with std::ios_base::iword() and std::ios_base::pword().
The underlying stream buffer throws an exception. The stream buffer might lose its integrity, as in memory shortage, or code conversion failure, or an unrecoverable read error from the external device. The stream buffer can indicate this loss of integrity by throwing an exception, which is caught by the stream and results in setting the badbit in the stream's state.
That doesn't tell you what the problem is, but it might give you a place to start.
Keep in mind the EOF bit is generally not set until a read is attempted and fails. (In other words, checking rifile.good after calling seekg may not accomplish anything.)
As Andrey suggested, checking errno (or using an OS-specific API) might let you get at the underlying problem. This answer has example code for doing that.
Side note: Because rifile is a local object, you don't need to close it once you're finished. Understanding that is important for understanding RAII, a key technique in C++.
try old errno. It should show real reason for error. unfortunately there is no C++ish way to do it.