std::ios::failue (ios::badbit) problem with fstream.write() - c++

I've been trying to debug my code for the past few hours and I couldn't figure out the problem. I eventually set my filestream to throw exceptions on failbit and I found that my filestream was setting the failbit for some reason. I have absolutely no reason why the failbit is being set, because all I'm doing is writing 2048-byte chunks of data to the stream until suddenly it fails (at the same spot each time).
I would like to show you my code to see if anyone can see a problem and possibly see what might cause a std::ios::failure to be thrown:
bool abstractBlock::encryptBlockRC4(char* key)
{//Thic encryption method can be chunked :)
getStream().seekg(0,std::ios::end);
int sLen = int(getStream().tellg())-this->headerSize;
seekg(0);//Seek to beginning of Data
seekp(0);
char* encryptionChunkBuffer = new char[2048]; //2KB chunk buffer
for (int chunkIterator =0; chunkIterator<sLen; chunkIterator+=2048)
{
if (chunkIterator+2048<=sLen)
{
getStream().read(encryptionChunkBuffer,2048);
char* encryptedData = EnDeCrypt(encryptionChunkBuffer,2048,key);
getStream().write(encryptedData,2048);
free(encryptedData);
}else{
int restLen = sLen-chunkIterator;
getStream().read(encryptionChunkBuffer,restLen);
char* encryptedData = EnDeCrypt(encryptionChunkBuffer,restLen,key);
getStream().write(encryptedData,restLen);
delete encryptedData;
}
}
delete [] encryptionChunkBuffer;
dataFlags |= DATA_ENCRYPTED_RC4; // Set the "encryted (rc4)" bit
seekp(0); //Seek tp beginning of Data
seekg(0); //Seek tp beginning of Data
return true;
}
The above code is essentially encrypting a file using 2048 chunks. It basically reads 2048 bytes, encrypts it and then writes it back to the stream (overwrites the "unencrypted" data that was previously there).
getStream() is simply returning the fstream handle to the file thats being operated on.
The error always occurs when chunkIterator==86116352 on the line getStream().write(encryptedData,2048);
I know my code may be hard to decode, but maybe you can tell me some possible things that might trigger a failbit? Currently, I think that the problem lies within the fact that I an reading/writing to a stream and it may be causing problems, but as I mentioned any ideas that can cause a failbit would maybe help me investigate the problem more.

You must seekp between changing from reads to writes (and vice versa, using seekg). It appears your implementation allows you to avoid this some of the time. (You're probably running into implicit flushes or other buffer manipulation which hide the problem sometimes.)
C++03, §27.5.1p1, Stream buffer requirements
The controlled sequences can impose limitations on how the program can read characters from a
sequence, write characters to a sequence, put characters back into an input sequence, or alter the stream
position.
This just generally states these aspects are controlled by the specific stream buffer.
C++03, §27.8.1.1p2, Class template basic_filebuf
The restrictions on reading and writing a sequence controlled by an object of class
basic_filebuf<charT,traits> are the same as for reading and writing with the Standard C library
FILEs.
fstream, ifstream, and ofstream use filebufs.
C99, §7.19.5.3p6, The fopen function
When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
You may need to look up these calls to translate to iostreams terminology, but it is fairly straight-forward.
You sometimes free the result of EnDeCrypt and sometimes delete it (but with single-object delete and not the array form); this most likely doesn't contribute to the problem you see, but it's either an error on your part or, less likely, on the part of the designer of EnDeCrypt.
You're using:
char* encryptionChunkBuffer = new char[2048]; //2KB chunk buffer
//...
getStream().read(encryptionChunkBuffer,2048);
//...
delete[] encryptionChunkBuffer;
But it would be better and easier to use:
vector<char> encryptionChunkBuffer (2048); //2KB chunk buffer
//...
getStream().read(&encryptionChunkBuffer[0], encryptionChunkBuffer.size());
//...
// no delete
If you don't want to type encryptionChunkBuffer.size() twice, then use a local constant for it.

Related

why stream write file, file size increse by 4k for each time? [duplicate]

In case of buffered stream it said in a book that it wait until the buffer is full to write back to the monitor. For example:
cout << "hi";
What do they mean by "the buffer is full".
cerr << "hi";
It is said in my book that everything sent to cerr is written to the standard error device immediately, what does it mean?
char *ch;
cin>> ch; // I typed "hello world";
In this example ch will be assigned to "hello" and "world" will be ignored does it mean that it still in the buffer and it will affect the results of future statements?
Your book doesn't seem very helpful.
1) The output streams send their bytes to a std::streambuf, which may
contain a buffer; the std::filebuf (derived from streambuf) used by
and std::ofstream will generally be buffered. That means that when
you output a character, it isn't necessarily output immediately; it will
be written to a buffer, and output to the OS only when the buffer is
full, or you explicitly request it in some way, generally by calling
flush() on the stream (directly, or indirectly, by using std::endl).
This can vary, however; output to std::cout is synchronized with
stdout, and most implementations will more or less follow the rules of
stdout for std::cout, changing the buffering strategy if the output
is going to an interactive device.
At any rate, if you're unsure, and you want to be sure that the output
really does leave your program, just add a call to flush.
2) Your book is wrong here.
One of the buffering strategies is unitbuf; this is a flag in the
std::ostream which you can set or reset (std::ios_base::set() and
std::ios_base::unset()—std::ios_base is a base class of
std::ostream, so you can call these functions on an std::ostream
object). When unitbuf is set, std::ostream adds a call to flush()
to the end of every output function, so when you write:
std::cerr << "hello, world";
the stream will be flushed after all of the characters in the string
are output, provided unitbuf is set. On start-up, unitbuf is set
for std::cerr; by default, it is not set on any other file. But you
are free to set or unset it as you wish. I would recommend against
unsetting it on std::cerr, but if std::cout is outputting to an
interactive device, it makes a lot of sense to set it there.
Note that all that is in question here is the buffer in the streambuf.
Typically, the OS also buffers. All flushing the buffer does is
transfer the characters to the OS; this fact means that you cannot use
ofstream directly when transactional integrity is required.
3) When you input to a string or a character buffer using >>, the
std::istream first skips leading white space, and then inputs up to
but not including the next white space. In the formal terms of the
standard, it "extracts" the characters from the stream, so that they
will not be seen again (unless you seek, if the stream supports it).
The next input will pickup where ever the previous left off. Whether
the following characters are in a buffer, or still on disk, is really
irrelevant.
Note that the buffering of input is somewhat complex, in that it occurs
at several different levels, and at the OS level, it takes different
forms depending on the device. Typically, the OS will buffer a file by
sectors, often reading several sectors in advance. The OS will always
return as many characters as were demanded, unless it encounters end of
file. Most OSs will buffer a keyboard by line: not returning from a
read request until a complete line has been entered, and never returning
characters beyond the end of the current line in a read request.
In the same manner as std::ostream uses a streambuf for output,
std::istream uses one to get each individual character. In the case
of std::cin, it will normally be a filebuf; when the istream
requests a character, the filebuf will return one from its buffer if
it has one; if it doesn't, it will attempt to refill the buffer,
requesting e.g. 512 (or whatever its buffer size is) characters from the
OS. Which will respond according to its buffering policy for the
device, as described above.
At any rate, if std::cin is connected to the keyboard, and you've
typed "hello world", all of the characters you've typed will be read
by the stream eventually. (But if you're using >>, there'll be a lot
of whitespace that you won't see.)
streams in C++ are buffer to increase efficiency, that is file and console IO is very slow in comparison to memory operations.
To combat this C++ streams have a buffer (a bank of memory) that contains everything to write to the file or output, when it is full then it flushed to the file. The inverse is true for input, it fetches more when it the buffer is depleted.
This is very import for streams because the following
std::cout << 1 << "hello" << ' ' << "world\n";
Would be 4 writes to a file which is inefficient.
However in the case of std::cout, cin, and cerr then these type actually have buffering turned off by default to ensure that it can be used in conjunction with std::printf and std::puts etc...
To re-enable it (which I recommend doing):
std::ios_base::sync_with_stdio(false);
But don't use C style console output whilst it is set false or bad things may happen.
You can check out the differences yourself with a small app.
#include <iostream>
int main() {
std::cout<<"Start";
//std::cout<<"Start"<<std::endl; // line buffered, endl will flush.
double j = 2;
for(int i = 0; i < 100000000; i++){
j = i / (i+1);
}
std::cout<<j;
return 0;
}
Try out the difference of the two "Start" statments, then change to cerr. The difference you notice is due to buffering.
The for-statement takes about 2 seconds on my rig, you might need to tweak the i < condition on yours.
1) what do they mean by "the buffer is full".
With buffered output there's a region of memory, called a buffer, where the stuff you write out is stored before it is actually written to the output. When you say cout << "hi" the string is probably only copied into that buffer and not written out. cout usually waits until that memory has been filled up before it actually starts writing things out.
The reason for this is because usually the process of starting to actually write data is slow, and so if you do that for every character you get terrible performance. A buffer is used so that the program only has to do that infrequently and you get much better performance.
2) it said in my book that everything sent to cerr is written to the standard error device immediatly, does this mean it send 'h' and then 'i'...?
It just means that no buffer is used. cerr might still send 'h' and 'i' at the same time since it already has both of them.
3)in this example ch will be assigned to "hello" and "world" will be ignored does it mean that it still in the buffer and it will affect the results of future statements ?
This doesn't really have anything to do with buffering. the operator >> for char* is defined to read until it sees whitespace, so it stops at the space between "hello" and "world". But yes, the next time you read you will get "world".
(although if that code isn't just a paraphrase of the actuall code then it has undefined behavior because you're reading the text into an undefined memory location. Instead you should do:
std::string s;
cin >> s;
)
Each call to write to the terminal is slow, so to avoid doing slow things often the data is stored in memory until either a certain amount of data has been entered or the buffer is flushed manually with fflush or std::endl. The result of this is sometimes that text might not be written to the terminal at the moment you expect it to.
Since the timing of error messages is more critical than normal output, the performance hit is ignored and the data is not buffered. However, since a string is passed in a single piece of data, it is written in one call (inside a loop somewhere).
It world would still be in the buffer, but it's quite easy to prove this yourself by trying it in a 3 line program. However, your example will fail since you are attempting to write into unallocated memory. You should be taking input into a std::string instead.

How does this one stream command read in an entire file in c++?

Given this:
auto f = std::ifstream{file};
if (f) {
std::stringstream stream;
stream << f.rdbuf();
return stream.str();
}
return std::string{};
I don't see why it works.
I don't know what type f is, because it says auto, but apparently you can check that for non-zero-ness.
But when the file is large, like 2 gig, the delay in running happens in
this line:
stream << f.rdbuf();
The documentation says rdbuf() gets you the pointer to the ifstream's internal buffer. So in order for it to read the entire file, the buffer would have to size the file, and load it all in in one shot. But by the time the stream << happens, rdbuf() has to already be set, or it won't be able to return a point.
I would expect the constructor to do that in this case, but it's obviously lazy loaded, because reading in the entire file on construction would be bad for every other use case, and the delay is in the stream << command.
Any thoughts? All other stack overflow references to reading in a file to a string always loop in some way.
If there's some buffer involved, which obviously there is, how big can it get? What if it is 1 byte, it would surely be slow.
Adorable c++ is very opaque, bad for programmers who have to know what's going on under the covers.
It's a function of how operator<< is defined on ostreams when the argument is a streambuf. As long as the streambuf isn't a null pointer, it extracts characters from the input sequence controlled by the streambuf and inserts them into *this until one of the following conditions are met (see operator<< overload note #9):
end-of-file occurs on the input sequence;
inserting in the output sequence fails (in which case the character to be inserted is not extracted);
an exception occurs (in which case the exception is caught).
Basically, the ostream (which stringstream inherits from) knows how to exercise a streambuf to pull all the data from the file it's associated with. It's an idiomatic, but as you note, not intuitive, way to slurp a whole file. The streambuf isn't actually buffering all the data here (as you note, reading the whole file into the buffer would be bad in the general case), it's just that it has the necessary connections to adjust the buffered window as an ostream asks for more (and more, and more) data.
if (f) works because ifstream has an overload for operator bool that is implicitly invoked when the "truthiness" of the ifstream is tested that tells you if the file is in a failure state.
To answer your first question first:
f is of the type that's assigned to it, an std::ifstream, but that's a rather silly way to write it. One would usually write std::ifstream f {...}. A stream has an overloaded operator bool () which gives you !fail().
As for the second question: What .rdbuf() returns is a streambuf object. This object doesn't contain the whole file contents when it is returned. Instead, it provides an interface to access data, and this interface is used by the stringstream stream.
auto f = std::ifstream{file};
Type of f is std::ifstream.
stream << f.rdbuf();
std::ifstream maintains a buffer which you can get by f.rdbuf() and it does not load entire file in 1 shot. The loading happens when the above commands is called, stringstream will extract data from that buffer, and ifstream will perform loading as the buffer runs out of data.
You can manually set the buffer size by using setbuf.

Reading and writing the same file simultaneosly with c++

I'm trying to read and write a file as I loop through its lines. At each line, I will do an evaluation to determine if I want to write it into the file or skip it and move onto the next line. This is a basically a skeleton of what I have so far.
void readFile(char* fileName)
{
char line[1024];
fstream file("test.file", ios::in | ios::out);
if(file.is_open())
{
while(file.getline(line,MAX_BUFFER))
{
//evaluation
file.seekg(file.tellp());
file << line;
file.seekp(file.tellg());
}
}
}
As I'm reading in the lines, I seem to be having issues with the starting index of the string copied into the line variable. For example, I may be expecting the string in the line variable to be "000/123/FH/" but it actually goes in as "123/FH/". I suspect that I have an issue with file.seekg(file.tellp()) and file.seekp(file.tellg()) but I am not sure what it is.
It is not clear from your code [1] and problem description what is in the file and why you expect "000/123/FH/", but I can state that the getline function is a buffered input, and you don't have code to access the buffer. In general, it is not recommended to use buffered and unbuffered i/o together because it requires deep knowledge of the buffer mechanism and then relies on that mechanism not to change as libraries are upgraded.
You appear to want to do byte or character[2] level manipulation. For small files, you should read the entire file into memory, manipulate it, and then overwrite the original, requiring an open, read, close, open, write, close sequence. For large files you will need to use fread and/or some of the other lower level C library functions.
The best way to do this, since you are using C++, is to create your own class that handles reading up to and including a line separator [3] into one of the off-the-shelf circular buffers (that use malloc or a plug-in allocator as in the case of STL-like containers) or a circular buffer you develop as a template over a statically allocated array of bytes (if you want high speed an low resource utilization). The size will need to be at least as large as the longest line in the later case. [4]
Either way, you would want to add to the class to open the file in binary mode and expose the desired methods to do the line level manipulations to an arbitrary line. Some say (and I personally agree) that taking advantage of Bjarne Stroustrup's class encapsulation in C++ is that classes are easier to test carefully. Such a line manipulation class would encapsulate the random access C functions and unbuffered i/o and leave open the opportunity to maximize speed, while allowing for plug-and-play usage in systems and applications.
Notes
[1] The seeking of the current position is just testing the functions and does not yet, in the current state of the code, re-position the current file pointer.
[2] Note that there is a difference between character and byte level manipulations in today's computing environment where utf-8 or some other unicode standard is now more common than ASCII in many domains, especially that of the web.
[3] Note that line separators are dependent on the operating system, its version, and sometimes settings.
[4] The advantage of circular buffers in terms of speed is that you can read more than one line using fread at a time and use fast iteration to find the next end of line.
Taking inspiration from Douglas Daseeco's response, I resolved my issue by simply reading the existing file, writing its lines into a new file, then renaming the new file to overwrite the original file. Below is a skeleton of my solution.
char line[1024];
ifstream inFile("test.file");
ofstream outFile("testOut.file");
if(inFile.is_open() && outFile.is_open())
{
while(inFile.getline(line,1024))
{
// do some evaluation
if(keep)
{
outFile << line;
outFile << "\n";
}
}
inFile.close();
outFile.close();
rename("testOut.file","test.file");
}
You are reading and writing to the same file you might end up of having duplicate lines in the file.
You could find this very useful. Imagine your 1st time of reaching the while loop and starting from the beginning of the file you do file.getline(line, MAX_BUFFER). Now the get pointer (for reading) moves MAX_BUFFER places from the beginning of the file (your starting point).
After you've determine to write back to the file seekp() helps to specify with respect to a reference point the location you want to write to, syntax: file.seekp(num_bytes,"ref"); where ref will be ios::beg(beginning), ios::end, ios::cur (current position in file).
As in your code after reading, find a way to use MAX_BUFFER to refer to a location with respect to a reference.
while(file.good())
{
file.getline(line,MAX_BUFFER);
...
if(//for some reasone you want to write back)
{
// set put-pointer to location for writing
file.seekp(num_bytes, "ref");
file << line;
}
//set get-pointer to desired location for the next read
file.seekg(num_bytes, "ref");
}

How does the buffer know how many characters to transfer from the external file during a flush operation?

Say I have an input operation:
file >> x;
If the internal buffer of file is empty underflow() will be called to import characters from the external device to the internal buffer of file. It is implementation-defined if the buffer will be partially or completely filled after this flush operation. Taking that into account, is it possible that if x is a string and I am expecting an input value of a certain length, that the buffer is in its right to transfer fewer characters than that? Can this happen?
There is no real constraint on how many characters underflow() makes available. The only real constraint is that a stream which hasn't reached EOF needs to make, at least, one character available. With respect specifically std::filebuf (or std::basic_filebuf<...>) the stream may be unbuffered (if setbuf(0, 0) was called) in which case it would, indeed, make individual characters available. Otherwise, the stream will try to fill its internal buffer and rely on the operating system to have a the underlying operation return a suitable amount of bytes if there are few available, yet.
I'm not sure I quite understand your question: the operation file >> x will return once x is completely read which can happen if the stream indicated by file has reached its end or when a whitespace character is found (and if with "string" you mean char*, a non-zero value stored in file.width() is also taken into account). With respect to the underlying stream buffer, clearly x may require multiple reads to the underlying representation, i.e., it is unpredictable how many calls to underflow() are made. Given that the file's internal buffer is probably matching the disc's block size, I would expect that at most one call to underflow() is made for "normal" strings. However, if the file read is a huge and doesn't contain any spaces many calls to underflow() may be made. Given that the stream needs to find whitespaces it has no way to predict how many characters are needed in the first place.

Overloading operator>> to a char buffer in C++ - can I tell the stream length?

I'm on a custom C++ crash course. I've known the basics for many years, but I'm currently trying to refresh my memory and learn more. To that end, as my second task (after writing a stack class based on linked lists), I'm writing my own string class.
It's gone pretty smoothly until now; I want to overload operator>> that I can do stuff like cin >> my_string;.
The problem is that I don't know how to read the istream properly (or perhaps the problem is that I don't know streams...). I tried a while (!stream.eof()) loop that .read()s 128 bytes at a time, but as one might expect, it stops only on EOF. I want it to read to a newline, like you get with cin >> to a std::string.
My string class has an alloc(size_t new_size) function that (re)allocates memory, and an append(const char *) function that does that part, but I obviously need to know the amount of memory to allocate before I can write to the buffer.
Any advice on how to implement this? I tried getting the istream length with seekg() and tellg(), to no avail (it returns -1), and as I said looping until EOF (doesn't stop reading at a newline) reading one chunk at a time.
To read characters from the stream until the end of line use a loop.
char c;
while(istr.get(c) && c != '\n')
{
// Apped 'c' to the end of your string.
}
// If you want to put the '\n' back onto the stream
// use istr.unget(c) here
// But I think its safe to say that dropping the '\n' is fine.
If you run out of room reallocate your buffer with a bigger size.
Copy the data across and continue. No need to be fancy for a learning project.
you can use cin::getline( buffer*, buffer_size);
then you will need to check for bad, eof and fail flags:
std::cin.bad(), std::cin.eof(), std::cin.fail()
unless bad or eof were set, fail flag being set usually indicates buffer overflow, so you should reallocate your buffer and continue reading into the new buffer after calling std::cin.clear()
A side note: In the STL the operator>> of an istream is overloaded to provide this kind of functionality or (as for *char ) are global functions. Maybe it would be more wise to provide a custom overload instead of overloading the operator in your class.
Check Jerry Coffin's answer to this question.
The first method he used is very simple (just a helper class) and allow you to write your input in a std::vector<std::string> where each element of the vector represents a line of the original input.
That really makes things easy when it comes to processing afterwards!