Reading and writing the same file simultaneosly with c++ - c++

I'm trying to read and write a file as I loop through its lines. At each line, I will do an evaluation to determine if I want to write it into the file or skip it and move onto the next line. This is a basically a skeleton of what I have so far.
void readFile(char* fileName)
{
char line[1024];
fstream file("test.file", ios::in | ios::out);
if(file.is_open())
{
while(file.getline(line,MAX_BUFFER))
{
//evaluation
file.seekg(file.tellp());
file << line;
file.seekp(file.tellg());
}
}
}
As I'm reading in the lines, I seem to be having issues with the starting index of the string copied into the line variable. For example, I may be expecting the string in the line variable to be "000/123/FH/" but it actually goes in as "123/FH/". I suspect that I have an issue with file.seekg(file.tellp()) and file.seekp(file.tellg()) but I am not sure what it is.

It is not clear from your code [1] and problem description what is in the file and why you expect "000/123/FH/", but I can state that the getline function is a buffered input, and you don't have code to access the buffer. In general, it is not recommended to use buffered and unbuffered i/o together because it requires deep knowledge of the buffer mechanism and then relies on that mechanism not to change as libraries are upgraded.
You appear to want to do byte or character[2] level manipulation. For small files, you should read the entire file into memory, manipulate it, and then overwrite the original, requiring an open, read, close, open, write, close sequence. For large files you will need to use fread and/or some of the other lower level C library functions.
The best way to do this, since you are using C++, is to create your own class that handles reading up to and including a line separator [3] into one of the off-the-shelf circular buffers (that use malloc or a plug-in allocator as in the case of STL-like containers) or a circular buffer you develop as a template over a statically allocated array of bytes (if you want high speed an low resource utilization). The size will need to be at least as large as the longest line in the later case. [4]
Either way, you would want to add to the class to open the file in binary mode and expose the desired methods to do the line level manipulations to an arbitrary line. Some say (and I personally agree) that taking advantage of Bjarne Stroustrup's class encapsulation in C++ is that classes are easier to test carefully. Such a line manipulation class would encapsulate the random access C functions and unbuffered i/o and leave open the opportunity to maximize speed, while allowing for plug-and-play usage in systems and applications.
Notes
[1] The seeking of the current position is just testing the functions and does not yet, in the current state of the code, re-position the current file pointer.
[2] Note that there is a difference between character and byte level manipulations in today's computing environment where utf-8 or some other unicode standard is now more common than ASCII in many domains, especially that of the web.
[3] Note that line separators are dependent on the operating system, its version, and sometimes settings.
[4] The advantage of circular buffers in terms of speed is that you can read more than one line using fread at a time and use fast iteration to find the next end of line.

Taking inspiration from Douglas Daseeco's response, I resolved my issue by simply reading the existing file, writing its lines into a new file, then renaming the new file to overwrite the original file. Below is a skeleton of my solution.
char line[1024];
ifstream inFile("test.file");
ofstream outFile("testOut.file");
if(inFile.is_open() && outFile.is_open())
{
while(inFile.getline(line,1024))
{
// do some evaluation
if(keep)
{
outFile << line;
outFile << "\n";
}
}
inFile.close();
outFile.close();
rename("testOut.file","test.file");
}

You are reading and writing to the same file you might end up of having duplicate lines in the file.
You could find this very useful. Imagine your 1st time of reaching the while loop and starting from the beginning of the file you do file.getline(line, MAX_BUFFER). Now the get pointer (for reading) moves MAX_BUFFER places from the beginning of the file (your starting point).
After you've determine to write back to the file seekp() helps to specify with respect to a reference point the location you want to write to, syntax: file.seekp(num_bytes,"ref"); where ref will be ios::beg(beginning), ios::end, ios::cur (current position in file).
As in your code after reading, find a way to use MAX_BUFFER to refer to a location with respect to a reference.
while(file.good())
{
file.getline(line,MAX_BUFFER);
...
if(//for some reasone you want to write back)
{
// set put-pointer to location for writing
file.seekp(num_bytes, "ref");
file << line;
}
//set get-pointer to desired location for the next read
file.seekg(num_bytes, "ref");
}

Related

Is seekg or ignore more efficient?

I'm using C++ to write a very time-sensitive application, so efficiency is of the utmost importance.
I have a std::ifstream, and I want to jump a specific amount of characters (a.k.a. byte offset, I'm not using wchar_t) to get to a specific line instead of using std::getline to read every single line because it is too inefficient for me.
Is it better to use seekg or ignore to skip a specified number of characters and start reading from there?
size_t n = 100;
std::ifstream f("test");
f.seekg(n, std::ios_base::beg);
// vs.
f.ignore(n);
Looking at cplusplus.com for both functions, it seems like ignore will use sbumpc or sgetc to skip the requested amount of characters. This means that it works even on streams that do not natively support skipping (which ifstream does), but it also processes every single byte.
seekg on the other hand uses pubseekpos or pubseekoff, which is implementation defined. For files, this should directly skip to the desired position without processing the bytes up to it.
I would expect seekg to be much more efficent, but as others said: doing your own tests with a big file would be the best way to go for you.

Using Getline on a Binary File

I have read that getline behaves as an unformatted input function. Which I believe should allow it to be used on a binary file. Let's say for example that I've done this:
ofstream ouput("foo.txt", ios_base::binary);
const auto foo = "lorem ipsum";
output.write(foo, strlen(foo) + 1);
output.close();
ifstream input("foo.txt", ios_base::binary);
string bar;
getline(input, bar, '\0');
Is that breaking any rules? It seems to work fine, I think I've just traditionally seen arrays handled by writing the size and then writing the array.
No, it's not breaking any rules that I can see.
Yes, it's more common to write an array with a prefixed size, but using a delimiter to mark the end can work perfectly well also. The big difference is that (like with a text file) you have to read through data to find the next item. With a prefixed size, you can look at the size, and skip directly to the next item if you don't need the current one. Of course, you also need to ensure that if you're using something to mark the end of a field, that it can never occur inside the field (or come up with some way of detecting when it's inside a field, so you can read the rest of the field when it does).
Depending on the situation, that can mean (for example) using Unicode text. This gives you a lot of options for values that can't occur inside the text (because they aren't legal Unicode). That, on the other hand, would also mean that your "binary" file is really a text file, and has to follow some basic text-file rules to make sense.
Which is preferable depends on how likely it is that you'll want to read random pieces of the file rather than reading through it from beginning to end, as well as the difficulty (if any) of finding a unique delimiter and if you don't have one, the complexity of making the delimiter recognizable from data inside a field. If the data is only meaningful if written in order, then having to read it in order doesn't really pose a problem. If you can read individual pieces meaningfully, then being able to do so much more likely to be useful.
In the end, it comes down to a question of what you want out of your file being "binary'. In the typical case, all 'binary" really means is that what end of line markers that might be translated from a new-line character to (for example) a carriage-return/line-feed pair, won't be. Depending on the OS you're using, it might not even mean that much though--for example, on Linux, there's normally no difference between binary and text mode at all.
Well, there are no rules broken and you'll get away with that just fine, except that may miss the precision of reading binary from a stream object.
With binary input, you usually want to know how many characters were read successfully, which you can obtain afterwards with gcount()... Using std::getline will not reflect the bytes read in gcount().
Of cause, you can simply get such info from the size of the string you passed into std::getline. But the stream will no longer encapsulate the number of bytes you consumed in the last Unformatted Operation

How read file functions recognize end of a text file in C++?

As far as you know, there are two standard to read a text file in C++ (in this case 2 numbers in every line) :
The two standard methods are:
Assume that every line consists of 2 numbers and read token by token:
#include <fstream>
std::ifstream infile("thefile.txt");
int a, b;
while (infile >> a >> b)
{
// process pair (a,b)
}
Line-based parsing, using string streams:
#include <sstream>
#include <string>
#include <fstream>
std::ifstream infile("thefile.txt");
std::string line;
while (std::getline(infile, line))
{
std::istringstream iss(line);
int a, b;
if (!(iss >> a >> b)) { break; } // error
// process pair (a,b)
}
And also I can use the below code to see if the files ends or not :
while (!infile.eof())
My question is :
Question1: how this functions understand that one line is the last
line? I mean "how eof() returns false\true?"
As far as I know, they reading a part of memory. what is the
difference between the part that belongs to the file and the parts
that not?
Question2: Is there anyway to cheat this function?! I mean, Is it
possible to add something in the middle of the text file (for example
by a Hex editor tools) and make the eof() wrongly returns True in
the middle of the text file?
Appreciate your time and consideration.
Question1: how this functions understand that one line is the last line? I mean "how eof() returns false\true?"
It doesn't. The functions know when you've tried to read past the very last character in the file. They don't necessarily know whether a line is the last line. "Files" aren't the only things that you can read with streams. Keyboard input, a special purpose device, internet sockets: All can be read with the right kind of I/O stream. When reading from standard input, the stream has no knowing of if the very next thing I type is control-Z.
With regard to files on a computer disk, most modern operating systems store metadata regarding the file separate from the file. These metadata include the length of the file (and oftentimes when the file was last modified and when it was last read). On these systems, the stream buffer than underlies the I/O stream knows the current read location within the file and knows how long the file is. The stream buffer signals EOF when the read location reaches the length of the file.
That's not universal, however. There are some not-so-common operating systems that don't use this concept of metadata stored elsewhere. End of file on a disk file is just as surprising on these systems as is end of file from user input on a keyboard.
As far as I know, they reading a part of memory. what is the difference between the part that belongs to the file and the parts that not?
Learn the difference between memory and disk files. There's a huge difference between the two. Unless you're working with an embedded computer, memory is much more limited than is disk space.
Question2: Is there anyway to cheat this function?! I mean, Is it possible to add something in the middle of the text file (for example by a Hex editor tools) and make the eof() wrongly returns True in the middle of the text file?
That depends very much on how the operating system implements files. On most modern operating systems, the answer is not just "no" but "No!". The concept of using some special signature that indicates end of file in a disk file is one of many computer science concepts that for the most part have been dumped into the pile of "that wasn't very smart" ideas. You asked your question on the internet. That most likely means you are using a Windows machine, a Linux machine, or a Mac. All of them store the length of a file as metadata separate from the contents of a file.
However, there is a need for the ability to clear the end of file indicator. One program might be writing to a file while at the same time another is reading from it. The reader might hit EOF while the writer is still active. The reader needs to clear the EOF indicator to continue reading what the writer has written. The C++ I/O streams provide the ability to do just that. Every I/O stream has a clear function. Whether it works, that's a different story. The clear will work temporarily, but the very next read might well reset the EOF bit. For example, when I type control-Z on my keyboard, that means I am done interacting with the program, period, My next action might well be to go out for lunch.

C++ reading data from a file

Disclaimer: this question is directly related to my programming homework.
My C++ assignment consists of opening a .txt file, performing a bunch of operations on it, and then saving the .txt file. Problem is, I'm having a hard time just grasping the basic concepts of reading and writing files.
My code:
#include <iostream>
#include <fstream>
using namespace std;
int main () {
ifstream inData;
ofstream outData;
// is it necessary to open datalist.txt for both the in and out streams?
inData.open ("datalist.txt");
outData.open("datalist.txt");
if (inData.is_open()) {
cout << "yay, i opened it\n"; // this outputs as expected
char fileData[100]; // have to use char arrays as per instructor. no strings
inData >> fileData; // store text from datalist.txt in fileData char array
cout << fileData; // nothing happens here... why?
outData << "changing file text cause I can"; // this works just fine.
}
else {
cout << "boo, i couldn't open it";
}
inData.close();
outData.close();
return 0;
}
The main issue I'm encountering is that I don't understand how to read the data in a file at even a basic level, let alone parse the file into relevant information (the purpose of the program is to read, write, and manipulate information in a semi-colon delimited list).
In addition to this question, I also am a little confused on two other things. First, is it necessary to open datalist.txt for both the in and out streams, for some reason it just feels weird that I have to open the same file twice. Second, my instructor doesn't want us to use the string class, and instead use char arrays. I don't understand the logic behind this and was hoping someone could elaborate why (or perhaps give a counter argument to why) strings are bad.
You don't open a file for reading and writing at the same time. Well, not with two different objects that don't know about each other at any rate. You either use a std::fstream (which can do simultaneous reading and writing), or you read first, close the file, process the data, then write it.
Also:
//have to use char arrays as per instructor. no strings
I think you may want to get a better instructor. The use of a naked, stack-based char* array is not something that any C++ teacher worth their salt should endorse.
This is where buffer overruns come from.
Opening the same file for reading and writing via two different file objects is generally a poor idea. In your case, it has also led to (part of) your problem. By default, opening an ofstream truncates it. So when you read from inData you get nothing. That is why nothing happens here:
cout << fileData; // nothing happens here... why?
At the end, your file contains:
changing file text cause I can
And nothing else.
So to read the file, you must not open it for writing as you have. If you want to change the file text to just your string, you can simply do two separate operations. Open inData, read it, and close it. Now open outData, write our string, and close it.
On the other hand, if what you wanted was to append your string to the end of the existing file, you should open a single stream for reading and writing. Read until end of file, then with the file pointer still at the end, write your string.
That's the basic idea. Any more and I'd be doing your homework for you. :)

std::ios::failue (ios::badbit) problem with fstream.write()

I've been trying to debug my code for the past few hours and I couldn't figure out the problem. I eventually set my filestream to throw exceptions on failbit and I found that my filestream was setting the failbit for some reason. I have absolutely no reason why the failbit is being set, because all I'm doing is writing 2048-byte chunks of data to the stream until suddenly it fails (at the same spot each time).
I would like to show you my code to see if anyone can see a problem and possibly see what might cause a std::ios::failure to be thrown:
bool abstractBlock::encryptBlockRC4(char* key)
{//Thic encryption method can be chunked :)
getStream().seekg(0,std::ios::end);
int sLen = int(getStream().tellg())-this->headerSize;
seekg(0);//Seek to beginning of Data
seekp(0);
char* encryptionChunkBuffer = new char[2048]; //2KB chunk buffer
for (int chunkIterator =0; chunkIterator<sLen; chunkIterator+=2048)
{
if (chunkIterator+2048<=sLen)
{
getStream().read(encryptionChunkBuffer,2048);
char* encryptedData = EnDeCrypt(encryptionChunkBuffer,2048,key);
getStream().write(encryptedData,2048);
free(encryptedData);
}else{
int restLen = sLen-chunkIterator;
getStream().read(encryptionChunkBuffer,restLen);
char* encryptedData = EnDeCrypt(encryptionChunkBuffer,restLen,key);
getStream().write(encryptedData,restLen);
delete encryptedData;
}
}
delete [] encryptionChunkBuffer;
dataFlags |= DATA_ENCRYPTED_RC4; // Set the "encryted (rc4)" bit
seekp(0); //Seek tp beginning of Data
seekg(0); //Seek tp beginning of Data
return true;
}
The above code is essentially encrypting a file using 2048 chunks. It basically reads 2048 bytes, encrypts it and then writes it back to the stream (overwrites the "unencrypted" data that was previously there).
getStream() is simply returning the fstream handle to the file thats being operated on.
The error always occurs when chunkIterator==86116352 on the line getStream().write(encryptedData,2048);
I know my code may be hard to decode, but maybe you can tell me some possible things that might trigger a failbit? Currently, I think that the problem lies within the fact that I an reading/writing to a stream and it may be causing problems, but as I mentioned any ideas that can cause a failbit would maybe help me investigate the problem more.
You must seekp between changing from reads to writes (and vice versa, using seekg). It appears your implementation allows you to avoid this some of the time. (You're probably running into implicit flushes or other buffer manipulation which hide the problem sometimes.)
C++03, §27.5.1p1, Stream buffer requirements
The controlled sequences can impose limitations on how the program can read characters from a
sequence, write characters to a sequence, put characters back into an input sequence, or alter the stream
position.
This just generally states these aspects are controlled by the specific stream buffer.
C++03, §27.8.1.1p2, Class template basic_filebuf
The restrictions on reading and writing a sequence controlled by an object of class
basic_filebuf<charT,traits> are the same as for reading and writing with the Standard C library
FILEs.
fstream, ifstream, and ofstream use filebufs.
C99, §7.19.5.3p6, The fopen function
When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
You may need to look up these calls to translate to iostreams terminology, but it is fairly straight-forward.
You sometimes free the result of EnDeCrypt and sometimes delete it (but with single-object delete and not the array form); this most likely doesn't contribute to the problem you see, but it's either an error on your part or, less likely, on the part of the designer of EnDeCrypt.
You're using:
char* encryptionChunkBuffer = new char[2048]; //2KB chunk buffer
//...
getStream().read(encryptionChunkBuffer,2048);
//...
delete[] encryptionChunkBuffer;
But it would be better and easier to use:
vector<char> encryptionChunkBuffer (2048); //2KB chunk buffer
//...
getStream().read(&encryptionChunkBuffer[0], encryptionChunkBuffer.size());
//...
// no delete
If you don't want to type encryptionChunkBuffer.size() twice, then use a local constant for it.