C++ reading data from a file - c++

Disclaimer: this question is directly related to my programming homework.
My C++ assignment consists of opening a .txt file, performing a bunch of operations on it, and then saving the .txt file. Problem is, I'm having a hard time just grasping the basic concepts of reading and writing files.
My code:
#include <iostream>
#include <fstream>
using namespace std;
int main () {
ifstream inData;
ofstream outData;
// is it necessary to open datalist.txt for both the in and out streams?
inData.open ("datalist.txt");
outData.open("datalist.txt");
if (inData.is_open()) {
cout << "yay, i opened it\n"; // this outputs as expected
char fileData[100]; // have to use char arrays as per instructor. no strings
inData >> fileData; // store text from datalist.txt in fileData char array
cout << fileData; // nothing happens here... why?
outData << "changing file text cause I can"; // this works just fine.
}
else {
cout << "boo, i couldn't open it";
}
inData.close();
outData.close();
return 0;
}
The main issue I'm encountering is that I don't understand how to read the data in a file at even a basic level, let alone parse the file into relevant information (the purpose of the program is to read, write, and manipulate information in a semi-colon delimited list).
In addition to this question, I also am a little confused on two other things. First, is it necessary to open datalist.txt for both the in and out streams, for some reason it just feels weird that I have to open the same file twice. Second, my instructor doesn't want us to use the string class, and instead use char arrays. I don't understand the logic behind this and was hoping someone could elaborate why (or perhaps give a counter argument to why) strings are bad.

You don't open a file for reading and writing at the same time. Well, not with two different objects that don't know about each other at any rate. You either use a std::fstream (which can do simultaneous reading and writing), or you read first, close the file, process the data, then write it.
Also:
//have to use char arrays as per instructor. no strings
I think you may want to get a better instructor. The use of a naked, stack-based char* array is not something that any C++ teacher worth their salt should endorse.
This is where buffer overruns come from.

Opening the same file for reading and writing via two different file objects is generally a poor idea. In your case, it has also led to (part of) your problem. By default, opening an ofstream truncates it. So when you read from inData you get nothing. That is why nothing happens here:
cout << fileData; // nothing happens here... why?
At the end, your file contains:
changing file text cause I can
And nothing else.
So to read the file, you must not open it for writing as you have. If you want to change the file text to just your string, you can simply do two separate operations. Open inData, read it, and close it. Now open outData, write our string, and close it.
On the other hand, if what you wanted was to append your string to the end of the existing file, you should open a single stream for reading and writing. Read until end of file, then with the file pointer still at the end, write your string.
That's the basic idea. Any more and I'd be doing your homework for you. :)

Related

If I extract something from a stream, does the stream not contain what I extracted anymore?

If I have code like below and I store "This" in str first from stream streem:
using namespace std;
int main()
{
istringstream streem("This is the content in the stream.");
string str;
streem>>str;
cout<<str; //This will cout "This"
If I do streem>>str again and cout<<str again, this will display is.
So does this mean that "This" does not exist in the istringstream anymore?
What aboutfile streams`, because they retain data?
The answer is different for different streams .
The stringstream has a memory buffer and an indicator that remembers where you were up to on the reading. So the next read starts where the previous read left off.
File streams work in a similar way, they remember which point of the file they are up to. In both cases you can change position (including resetting to the beginning) using seekg.
File streams don't have separate read and write positions, so this same code might behave differently for a file stream . (In fact I think it causes undefined behaviour to read and write without an intervening seek).
Other input streams might not have seekable buffers, e.g. cin .

Reading and writing the same file simultaneosly with c++

I'm trying to read and write a file as I loop through its lines. At each line, I will do an evaluation to determine if I want to write it into the file or skip it and move onto the next line. This is a basically a skeleton of what I have so far.
void readFile(char* fileName)
{
char line[1024];
fstream file("test.file", ios::in | ios::out);
if(file.is_open())
{
while(file.getline(line,MAX_BUFFER))
{
//evaluation
file.seekg(file.tellp());
file << line;
file.seekp(file.tellg());
}
}
}
As I'm reading in the lines, I seem to be having issues with the starting index of the string copied into the line variable. For example, I may be expecting the string in the line variable to be "000/123/FH/" but it actually goes in as "123/FH/". I suspect that I have an issue with file.seekg(file.tellp()) and file.seekp(file.tellg()) but I am not sure what it is.
It is not clear from your code [1] and problem description what is in the file and why you expect "000/123/FH/", but I can state that the getline function is a buffered input, and you don't have code to access the buffer. In general, it is not recommended to use buffered and unbuffered i/o together because it requires deep knowledge of the buffer mechanism and then relies on that mechanism not to change as libraries are upgraded.
You appear to want to do byte or character[2] level manipulation. For small files, you should read the entire file into memory, manipulate it, and then overwrite the original, requiring an open, read, close, open, write, close sequence. For large files you will need to use fread and/or some of the other lower level C library functions.
The best way to do this, since you are using C++, is to create your own class that handles reading up to and including a line separator [3] into one of the off-the-shelf circular buffers (that use malloc or a plug-in allocator as in the case of STL-like containers) or a circular buffer you develop as a template over a statically allocated array of bytes (if you want high speed an low resource utilization). The size will need to be at least as large as the longest line in the later case. [4]
Either way, you would want to add to the class to open the file in binary mode and expose the desired methods to do the line level manipulations to an arbitrary line. Some say (and I personally agree) that taking advantage of Bjarne Stroustrup's class encapsulation in C++ is that classes are easier to test carefully. Such a line manipulation class would encapsulate the random access C functions and unbuffered i/o and leave open the opportunity to maximize speed, while allowing for plug-and-play usage in systems and applications.
Notes
[1] The seeking of the current position is just testing the functions and does not yet, in the current state of the code, re-position the current file pointer.
[2] Note that there is a difference between character and byte level manipulations in today's computing environment where utf-8 or some other unicode standard is now more common than ASCII in many domains, especially that of the web.
[3] Note that line separators are dependent on the operating system, its version, and sometimes settings.
[4] The advantage of circular buffers in terms of speed is that you can read more than one line using fread at a time and use fast iteration to find the next end of line.
Taking inspiration from Douglas Daseeco's response, I resolved my issue by simply reading the existing file, writing its lines into a new file, then renaming the new file to overwrite the original file. Below is a skeleton of my solution.
char line[1024];
ifstream inFile("test.file");
ofstream outFile("testOut.file");
if(inFile.is_open() && outFile.is_open())
{
while(inFile.getline(line,1024))
{
// do some evaluation
if(keep)
{
outFile << line;
outFile << "\n";
}
}
inFile.close();
outFile.close();
rename("testOut.file","test.file");
}
You are reading and writing to the same file you might end up of having duplicate lines in the file.
You could find this very useful. Imagine your 1st time of reaching the while loop and starting from the beginning of the file you do file.getline(line, MAX_BUFFER). Now the get pointer (for reading) moves MAX_BUFFER places from the beginning of the file (your starting point).
After you've determine to write back to the file seekp() helps to specify with respect to a reference point the location you want to write to, syntax: file.seekp(num_bytes,"ref"); where ref will be ios::beg(beginning), ios::end, ios::cur (current position in file).
As in your code after reading, find a way to use MAX_BUFFER to refer to a location with respect to a reference.
while(file.good())
{
file.getline(line,MAX_BUFFER);
...
if(//for some reasone you want to write back)
{
// set put-pointer to location for writing
file.seekp(num_bytes, "ref");
file << line;
}
//set get-pointer to desired location for the next read
file.seekg(num_bytes, "ref");
}

Using ifstream's getLIne C++

Hello World,
I am fairly new to C++ and I am trying to read a text file Line by Line. I did some research online and stumbled across ifstream.
What is troubling me is the getLine Method.
The parameters are istream& getline (char* s, streamsize n );
I understand that the variable s is where the line being read is saved. (Correct me if I am wrong)
What I do not understand is what the streamsize n is used for.
The documentation states that:
Maximum number of characters to write to s (including the terminating null character).
However if I do not know how long a given line is what do I set the streamsize n to be ?
Also,
What is the difference between ifstream and istream ?
Would istream be more suitable to read lines ? Is there a difference in performance ?
Thanks for your time
You almost never want to use this getline function. It's a leftover from back before std::string had been defined. It's for reading into a fixed-size buffer, so you'd do something like this:
static const int N = 1024;
char mybuffer[N];
myfile.getline(mybuffer, N);
...and the N was there to prevent getline from writing into memory past the end of the space you'd allocated.
For new code you usually want to use an std::string, and let it expand to accommodate the data being read into it:
std::string input;
std::getline(myfile, input);
In this case, you don't need to specify the maximum size, because the string can/will expand as needed for the size of the line in the input. Warning: in a few cases, this can be a problem--if (for example) you're reading data being fed into a web site, it could be a way for an attacker to stage a DoS attack by feeding an immense string, and bringing your system to its knees trying to allocate excessive memory.
Between istream and ifstream: an istream is mostly a base class that defines an interface that can be used to work with various derived classes (including ifstream objects). When/if you want to open a file from disk (or something similar) you want to use an ifstream object.

How read file functions recognize end of a text file in C++?

As far as you know, there are two standard to read a text file in C++ (in this case 2 numbers in every line) :
The two standard methods are:
Assume that every line consists of 2 numbers and read token by token:
#include <fstream>
std::ifstream infile("thefile.txt");
int a, b;
while (infile >> a >> b)
{
// process pair (a,b)
}
Line-based parsing, using string streams:
#include <sstream>
#include <string>
#include <fstream>
std::ifstream infile("thefile.txt");
std::string line;
while (std::getline(infile, line))
{
std::istringstream iss(line);
int a, b;
if (!(iss >> a >> b)) { break; } // error
// process pair (a,b)
}
And also I can use the below code to see if the files ends or not :
while (!infile.eof())
My question is :
Question1: how this functions understand that one line is the last
line? I mean "how eof() returns false\true?"
As far as I know, they reading a part of memory. what is the
difference between the part that belongs to the file and the parts
that not?
Question2: Is there anyway to cheat this function?! I mean, Is it
possible to add something in the middle of the text file (for example
by a Hex editor tools) and make the eof() wrongly returns True in
the middle of the text file?
Appreciate your time and consideration.
Question1: how this functions understand that one line is the last line? I mean "how eof() returns false\true?"
It doesn't. The functions know when you've tried to read past the very last character in the file. They don't necessarily know whether a line is the last line. "Files" aren't the only things that you can read with streams. Keyboard input, a special purpose device, internet sockets: All can be read with the right kind of I/O stream. When reading from standard input, the stream has no knowing of if the very next thing I type is control-Z.
With regard to files on a computer disk, most modern operating systems store metadata regarding the file separate from the file. These metadata include the length of the file (and oftentimes when the file was last modified and when it was last read). On these systems, the stream buffer than underlies the I/O stream knows the current read location within the file and knows how long the file is. The stream buffer signals EOF when the read location reaches the length of the file.
That's not universal, however. There are some not-so-common operating systems that don't use this concept of metadata stored elsewhere. End of file on a disk file is just as surprising on these systems as is end of file from user input on a keyboard.
As far as I know, they reading a part of memory. what is the difference between the part that belongs to the file and the parts that not?
Learn the difference between memory and disk files. There's a huge difference between the two. Unless you're working with an embedded computer, memory is much more limited than is disk space.
Question2: Is there anyway to cheat this function?! I mean, Is it possible to add something in the middle of the text file (for example by a Hex editor tools) and make the eof() wrongly returns True in the middle of the text file?
That depends very much on how the operating system implements files. On most modern operating systems, the answer is not just "no" but "No!". The concept of using some special signature that indicates end of file in a disk file is one of many computer science concepts that for the most part have been dumped into the pile of "that wasn't very smart" ideas. You asked your question on the internet. That most likely means you are using a Windows machine, a Linux machine, or a Mac. All of them store the length of a file as metadata separate from the contents of a file.
However, there is a need for the ability to clear the end of file indicator. One program might be writing to a file while at the same time another is reading from it. The reader might hit EOF while the writer is still active. The reader needs to clear the EOF indicator to continue reading what the writer has written. The C++ I/O streams provide the ability to do just that. Every I/O stream has a clear function. Whether it works, that's a different story. The clear will work temporarily, but the very next read might well reset the EOF bit. For example, when I type control-Z on my keyboard, that means I am done interacting with the program, period, My next action might well be to go out for lunch.

C++ IO file streams: writing from one file to another using operator<< and rdbuf()

I have a question about copying data from one file to another in C++ (fstream) using operator<<. Here is a code snippet that works for me:
#include <fstream>
#include <string>
void writeTo(string &fname, ofstream &out){
ifstream in;
in.open(fname.c_str(),fstream::binary);
if(in.good()){
out<<in.rdbuf();
in.close();
}else{
//error
}
}
I would like to be certain that after writing, the end of input file in stream in has been reached. However, if I test for in.eof(), it is false, despite the fact that checking the input and output files confirms that the entire contents has been properly copied over. Any ideas on how I would check for in.eof()?
EOF-bit is set when trying to read a character, but none is available (i.e. you have already consumed everything in the string). Apparently std::ostream::operator<<() does not attempt to read past the end of the string, so the bit is never set.
You should be able to get around this by attempting to access the next character: add in.peek() before you check in.eof(). I have tested this fix and it works.
The reason none of the status bits are set in the input file is because
you are reading through the streambuf, not the istream; the actual
reading takes place in the ostream::operator<<, which doesn't have
access to the istream.
I'm not sure it matters, however. The input will be read until
streambuf::sgetc returns EOF. Which would cause the eofbit to be
set in the istream if you were reading through the istream. The
only thing which might prevent this if you were reading through the
istream is if streambuf::sgetc threw an exception, which would cause
badbit to be set in istream; there is no other mechanism provided
for an input streambuf to report a read error. So wrap your out <<
in.rdbuf() in a try ... catch block, and hope that the implementation
actually does check for hardware errors. (I haven't checked recently,
but a lot of early implementations totally ignored read errors, treating
them as a normal end of file.)
And of course, since you're literally reading bytes (despite the <<, I
don't see how one could call this formatted input), you don't have to
consider the third possible source of errors, a format error (such as
"abc" when inputing an int).
Try in.rdbuf()->sgetc() == EOF.
Reference: http://www.cplusplus.com/reference/iostream/streambuf/sgetc/