Difference between CSV files which makes different outcome using getline() - c++

I'm writing a function which reads a CSV file using getline() and converts data to the vector of vectors. To test it I've tried to read two files with the same delimiter: one imported from the internet and second exported from R datasets. The first few lines of each looks like:
File1.csv
User ID,Category 1,Category 2,Category 3,Category 4,Category 5,Category 6,Category 7,Category 8,Category 9,Category 10
User 1,0.93,1.8,2.29,0.62,0.8,2.42,3.19,2.79,1.82,2.42
User 2,1.02,2.2,2.66,0.64,1.42,3.18,3.21,2.63,1.86,2.32
User 3,1.22,0.8,0.54,0.53,0.24,1.54,3.18,2.8,1.31,2.5
User 4,0.45,1.8,0.29,0.57,0.46,1.52,3.18,2.96,1.57,2.86
File2.csv
"","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width"
"1",5.1,3.5,1.4,0.2
"2",4.9,3,1.4,0.2
"3",4.7,3.2,1.3,0.2
"4",4.6,3.1,1.5,0.2
However getline() works only for the first one. In second case it simply returns white space. The function performs similar even if I copy single lines from one file to another (of course adding or removing additional colums) -- the rows from file1 will be always properly read while those from file2 never. I've even tried removing " chars, but without much improvement. However switching from comas to '\t' solves the problem.
I'm curious what's the difference between those two files that makes such different outcome?
The source code of my function:
vector<vector<string>> readData(string fileName,int firstLine,char delimeter){
//Open data file
fstream fin;
fin.open(fileName, ios::in);
//Data stored in 2d vector of strings
vector<vector<string>> data;
vector<string> row;
string line,word,temp;
//Read data
int i=0;
while(fin>>temp){
row.clear();
//Read line and store in 'line'
getline(fin,line);
//Don't read first n lines
if (i<firstLine){
i++;
continue;
}
cout<<line<<endl;
//Break words
stringstream s(line);
//Read every column and store in in 'word;
while(getline(s,word,delimeter)){
row.push_back(word);
}
//Append row to the data vector
data.push_back(row);
}
//Close file
fin.close();
return data;
}

The problem is here:
while(fin>>temp){
row.clear();
//Read line and store in 'line'
getline(fin,line);
fin >> temp reads everything till the first space or newline. It is not clear why you do that as only with getline(fin,line) you then try to read the full line and you are not using temp. In the first file fin>>temp consumes only "User", in the second file it consumes the full line, because there are no spaces.
If you look at the read data from the first file you will also notice that the first part of each line is missing.
Tip: Use more meaningful names for your variables. I didn't manage to fully understand your logic, because variables named s and the presence of row and line at the same time causes me headaces.

Related

Fstream getline() only reading from the very first line of text file, ignoring every other line

I am working on a coding project where I sort and organize data from a text file, and I cannot get the getline() function to read past the first line.
The idea is to capture the entire line, split it into 3 sections, assign it to an object, then move on. I can do everything except get getline() to work properly, here is the snippet of code I am having trouble with:
ifstream fin;
fin.open("textFile.txt");
while (!fin.eof()) // while loop to grab lines until the end of file is reached
{
getline(fin, line);
fin >> first >> last >> pace; // assigning the data to their respective variables
ClassObject obj(first, last, pace); // creating an object with those variables
ClassVector.push_back(obj); // assignment object to vector
}
This has been the closest I have gotten to reading every line while also sorting the data into a vector, but as I mentioned before, getline() will read line 1, and skip the rest of the file (1000 lines).
what you can do is rather than using !fin.eof(). i prefer to use something similar to this:
ifstream file ( fileName.c_str() );
while (file >> first >> last >> pace ) // assuming the file is delimited with spaces
{
// Do whatever you want with first, last, and pace
}
The "While loop" will keep reading the next line until we reach the end of the file.
If the lengths of first, last, pace are constant, you can also just get the contents of the line (in a string variable) and use substring on it, but this only works in the specific case that the lengths are constant throughout the entire file.

C++ Reading lines from a file and storing each line as a temporary vector with each element of the vector being a word from the line

I am currently having trouble converting my thoughts to code. Essentially, I am being tasked with reading lines from a text file. Because I will be accessing individual words from each line, I believe I need to store each line as a vector with each element of the vector being a word from the line. The words from each line will have to be accessed before moving onto the next line in the file. I understand I will probably use getline and stringstream, but I am stuck as to how to store each line from the getline as a vector of individual words.
For example, if I have the following lines in a text file:
This is the first line of the text file.
This is the second line of the text file.
I should be able to extract the words "This" and "first" from the first line and the words "This" and "second" from the second line. I do not need the extracted words from the first line when I move onto the second line.
ifstream textFile("stack.txt"); // reads the textfile
string sentences;
vector<string> data; //initializing a vector
for(int i=0;i<9;i++)
{
for(int j=0;j<2;j++)
{
while(textFile >> sentences)
{
data.push_back(sentences);
}
}
}
This stores all of the words in your textfile into a single vector.
output on console of textfile

How to DELETE a line(s) in C++?

I am new to file-handling...
I am writing a program that saves data in text-files in the following format:
3740541120991
Syed Waqas Ali
Rawalpindi
Lahore
12-12-2012
23:24
1
1
(Sorry for the bad alignment, it's NOT a part of the program)
Now I'm writing a delete function for the program that would delete a record.
So far this is my code:
void masterSystem::cancelReservation()
{
string line;
string searchfor = "3740541120991";
int i=0;
ifstream myfile("records.txt");
while (getline(myfile, line))
{
cout << line << endl;
if (line==searchfor)
{
// DELETE THIS + THE NEXT 8 LINES
}
}
}
I've done a bit of research and have found out that there is no easy way to access the line of a text file so we have to create another text file.
But the problem arises that how do I COPY the records/data before the record being deleted into the NEW text file?
Open the input file; read one line at a time from the input file. If you decide to want to keep that line, write it to the output file. On the other hand, if you want to 'delete' that line, don't write it to the output file.
You could have a record per line and make even more easy for example:
3740541120991|Syed Waqas Ali|Rawalpindi|Lahore|12-12-2012|23:24|1|1
and the | character saparating each field. This is a well known technic knows as CSV (Comma separated Values)
This way you don't have to worry about reading consecutive lines for erase a record and add a record access the file only once.
So your code becoms into:
void masterSystem::cancelReservation()
{
string line;
string searchfor = "3740541120991";
ifstream myfile("records.txt");
while (getline(myfile, line))
{
// Here each line is a record
// You only hace to decide if you will copy
// this line to the ouput file or not.
}
}
Don't think only about removing a record, there are others operations you will need to do against this file save a new record, read into memory and search.
Think a moment about search, and having your current desing in mind, try to answer this: How many reservations exists for date 12-12-2012 and past 12:00 AM?
In your code you have to access the file 8 times per record even if the other data is irrelevant to the question. But, if you have each record in a line you only have to access file 1 time per record.
With a few reservations the diference is near 0, but it grows exponentially (n^8).

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.