How do I deal with a carriage return line feed when trying to read in file - c++

So I am working on a file that I need to read in which contains both commas separating words and carriage return linefeed at the end of each line and I can't figure out a way to handle it. I am trying to read in each word before the comma and put it into the a vector until it hits the carriage return line feed but I am having problems.
Here is my text file (as seen on notepad++ so you can see the symbols. on the actual text, the things inside [] don't appear)
microwave,lamp,guitar,couch,bed,dog,cat[cr][lf]
P1:microwave,couch,bed,dog,chair,bookcase,fish[cr][lf]
I have tried multiple solutions, but nothing seems to work. Here is what I have tried so far. but it obviously isn't working. I have seen some users suggest using substring to somehow read out the comma, and read in the words but I am not sure how to do that. I couldn't find a good tutorial or example of one. In my head, I have the algorithm(or at least, steps on how to go about it), but i am not sure how to go about implementing it.
Import file (istream)
Read until comma, take string and place it in vector1 (getline, input, ,), vector.push_back(input)
Repeat previous step until you reach \cr\lf stop reading. (getline(input, '/r'))
move on to the next line
Read until comma, take string and place it in vector2
Repeat
Read the line until /cr/lf
Here is the code I put in practice using part of the above steps i made.
string input;
vector<string> v1;
vector<string> v2;
ifstream infile;
infile.open("example.txt");
while(getline(infile, input)) //read until end of line
{
while(getline(infile, input, '\r')) //read until it reaches a carriage return
{
while(getline(infile, input, ',')) // read until it reaches a comma
{
v1.push_back(input); //take the word and put in vector.
}
}
}
infile.close();
Any help would be appreciated.
Edit: I forgot to mention. When I used this code, it seemed to not import anything into the vectors. I am sure all the words got lost somewhere in the getline functions, but I don't know how to just read up to comma and carriage return line feed without using it.

You should use getline() to get a whole line first. It should handle carriage returns for you. Then, put the result into a stringstream and use getline() on it to separate the line at the commas.
My code that reads input into a vector of vectors:
#include <fstream>
#include <iostream>
#include <sstream>
#include <vector>
int main()
{
std::ifstream fin("input.txt");
std::vector<std::vector<std::string>> result;
for(std::string line; std::getline(fin, line);)
{
result.emplace_back();
std::stringstream ss(line);
for(std::string word; std::getline(ss, word, ',');)
{
result.back().push_back(word);
}
}
for(const auto &i : result)
{
for(const auto &j : i)
{
std::cout << j << ' ';
}
std::cout << '\n';
}
}
You can modify it to read into two vectors by just removing the outer loop and use two separate loops for each of the two vectors/lines.
In your code, you first have a loop that reads line by line until the end of the file. After you read a line, you have a loop that reads until a '\r', which as far as I know does not occur in a normal text file. Even if there are '\r's in the file, you would be overwriting what you just read in from the outer loop. Same thing with the loop inside that.
Were you taught that while(getline(fin, str)) reads from a file without knowing how it works?

Related

C++ file conversion: pipe delimited to comma delimited

I am trying to figure out how to turn this input file that is in pipe delimited form into comma delimited. I have to open the file, read it into an array, convert it into comma delimited in an output CSV file and then close all files. I have been told that the easiest way to do is within excel but I am not quite sure how.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream inFile;
string myArray[5];
cout << "Enter the input filename:";
cin >> inFileName;
inFile.open(inFileName);
if(inFile.is_open())
std::cout<<"File Opened"<<std::endl;
// read file line by line into array
cout<<"Read";
for(int i = 0; i < 5; ++i)
{
file >> myArray[i];
}
// File conversion
// close input file
inFile.close();
// close output file
outFile.close();
...
What I need to convert is:
Miles per hour|6,445|being the "second" team |5.54|9.98|6,555.00
"Ending" game| left at "beginning"|Elizabeth, New Jersey|25.25|6.78|987.01
|End at night, or during the day|"Let's go"|65,978.21|0.00|123.45
Left-base night|10/07/1900|||4.07|777.23
"Let's start it"|Start Baseball Game|Starting the new game to win
What the output should look like in comma-delimited form:
Miles per hour,"6,445","being the ""second"" team member",5.54,9.98,"6,555.00",
"""Ending"" game","left at ""beginning""","Denver, Colorado",25.25,6.78,987.01,
,"End at night, during the day","""Let's go""","65,978.21",0.00,123.45,
Left-base night, 10/07/1900,,,4.07,777.23,
"""Let's start it""", Start Baseball Game, Starting the new game to win,
I will show you a complete solution and explain it to you. But let's first have view on it:
#include <iostream>
#include <vector>
#include <fstream>
#include <regex>
#include <string>
#include <algorithm>
// I omit in the example here the manual input of the filenames. This exercise can be done by somebody else
// Use fixed filenames in this example.
const std::string inputFileName("r:\\input.txt");
const std::string outputFileName("r:\\output.txt");
// The delimiter for the source csv file
std::regex re{ R"(\|)" };
std::string addQuotes(const std::string& s) {
// if there are single quotes in the string, then replace them with double quotes
std::string result = std::regex_replace(s, std::regex(R"(")"), R"("")");
// If there is any quote (") or comma in the file, then quote the complete string
if (std::any_of(result.begin(), result.end(), [](const char c) { return ((c == '\"') || (c == ',')); })) {
result = "\"" + result + "\"";
}
return result;
}
// Some output function
void printData(std::vector<std::vector<std::string>>& v, std::ostream& os) {
// Go throug all rows
std::for_each(v.begin(), v.end(), [&os](const std::vector<std::string>& vs) {
// Define delimiter
std::string delimiter{ "" };
// Show the delimited strings
for (const std::string& s : vs) {
os << delimiter << s;
delimiter = ",";
}
os << "\n";
});
}
int main() {
// We first open the ouput file, becuse, if this cannot be opened, then no meaning to do the rest of the exercise
// Open output file and check, if it could be opened
if (std::ofstream outputFileStream(outputFileName); outputFileStream) {
// Open the input file and check, if it could be opened
if (std::ifstream inputFileStream(inputFileName); inputFileStream) {
// In this variable we will store all lines from the CSV file including the splitted up columns
std::vector<std::vector<std::string>> data{};
// Now read all lines of the CSV file and split it into tokens
for (std::string line{}; std::getline(inputFileStream, line); ) {
// Split line into tokens and add to our resulting data vector
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {}));
}
std::for_each(data.begin(), data.end(), [](std::vector<std::string>& vs) {
std::transform(vs.begin(), vs.end(), vs.begin(), addQuotes);
});
// Output, to file
printData(data, outputFileStream);
// And to the screen
printData(data, std::cout);
}
else {
std::cerr << "\n*** Error: could not open input file '" << inputFileName << "'\n";
}
}
else {
std::cerr << "\n*** Error: could not open output file '" << outputFileName << "'\n";
}
return 0;
}
So, then let's have a look. We have function
main, read csv files, split it into tokens, convert it, and write it
addQuotes. Add quote if necessary
printData print he converted data to an output stream
Let's start with main. main will first open the input file and the output file.
The input file contains a kind of structured data and is also called csv (comma separted values). But here we do not have a comma, but a pipe symbol as delimter.
And the result will be typically stored in a 2d-vector. In dimension 1 is the rows and the other dimension is for the columns.
So, what do we need to do next? As we can see, we need to read first all complete text lines form the source stream. This can be easily done with a one-liner:
for (std::string line{}; std::getline(inputFileStream, line); ) {
As you can see here, the for statement has an declaration/initialization part, then a condition, and then a statement, carried out at the end of the loop. This is well known.
We first define a variable "line" of type std::string and use the default initializer to create an empty string. Then we use std::getline to read from the stream a complete line and put it into our variable. The std::getline returns a reference to sthe stream, and the stream has an overloaded bool operator, where it returns, if there was a failure (or end of file). So, the for loop does not need an additional check for the end of file. And we do not use the last statement of the for loop, because by reading a line, the file pointer is advanced automatically.
This gives us a very simple for loop, fo reading a complete file line by line.
Please note: Defining the variable "line" in the for loop, will scope it to the for loop. Meaning, it is only visible in the for loop. This is generally a good solution to prevent the pollution of the outer name space.
OK, now the next line:
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), digit), {}));
Uh Oh, what is that?
OK, lets go step by step. First, we obviously want to add someting to our 2-dimensionsal data vector. We will use the std::vectors function emplace_back. We could have used also used push_back, but this would mean that we need to do unnecessary copying of data. Hence, we selected emplace_back to do an in place construction of the thing that we want to add to our 2-dimensionsal data vector.
And what do we want to add? We want to add a complete row, so a vector of columns. In our case a std::vector<std::string>. And, becuase we want to do in inplace construction of this vector, we call it with the vectors range constructor. Please see here: Constructor number 5. The range constructor takes 2 iterators, a begin and an end iterator, as parameter, and copies all values pointed to by the iterators into the vector.
So, we expect a begin and an end iterator. And what do we see here:
The begin iterator is: std::sregex_token_iterator(line.begin(), line.end(), digit)
And the end iterator is simply {}
But what is this thing, the sregex_token_iterator?
This is an iterator that iterates over patterns in a line. And the pattern is given by a regex. You may read here about the C++ regex libraray. Since it is very powerful, you unfortunately need to learn about it a little longer. And I cannot cover it here. But let us describe its basic functionality for our purpose: You can describe a pattern in some kind of meta language, and the
std::sregex_token_iterator will look for that pattern, and, if it finds a match, return the related data. In our case the pattern is very simple: Digits. This can be desribed with "\d+" and means, try to match one or more digits.
Now to the {} as the end iterator. You may have read that the {} will do default construction/initialization. And if you read here, number 1, then you see that the "default-constructor" constructs an end-of-sequence iterator. So, exactly what we need.
After we have read all data, we will transform the single strings, to the required output. This will be done with std::transform and the function addQuotes.
The strategy here is to first replace the single quotes with double quotes.
And then, next, we look, if there is any comma or quote in the string, then we enclose the whole string additionally in quotes.
And last, but not least, we have a simple output function and print the converted data into a file and on the screen.

C++: Getline stops reading at first whitespace

Basically my issue is that I'm trying to read in data from a .txt file that's full of numbers and comments and store each line into a string vector, but my getline function stops reading at the first whitespace character so a comment like (* comment *) gets broken up into
str[0] = "(*";
str[1] = "comment";
str[2] = "*)";
This is what my codeblock for the getline function looks like:
int main() {
string line;
string fileName;
cout << "Enter the name of the file to be read: ";
cin >> fileName;
ifstream inFile{fileName};
istream_iterator<string> infile_begin {inFile};
istream_iterator<string> eof{};
vector<string> data {infile_begin, eof};
while (getline(inFile, line))
{
data.push_back(line);
}
And this is what the .txt file looks like:
101481
10974
1013
(* comment *) 0
28292
35040
35372
0000
7155
7284
96110
26175
I can't figure out why it's not reading the whole line.
This is for the very simple reason that your code is not using std::getline to read the input file.
If you look at your code very carefully, you will see that before you even get to that point, your code constructs an istream_iterator<string> on the file, and by passing it, and the ending istream_iterator<string> value to the vector's constructor, this effectively swallows the entire file, one whitespace-delimited word at a time, into the vector.
And by the time things get around to the getline loop, the entire file has already been read, and the loop does absolutely nothing. Your getline isn't really doing anything, with the current state of affairs.
Get rid of that stuff that involves istream_iterators, completely, and simply let getline do the job it was intended for.

Problems using getline()

I'm running out of hair to pull out, so I thought maybe someone here could help me with this frustration.
I'm trying to read a file line by line, which seems simple enough, using getline(). Problem is, my code seems to keep ignoring the \n, and putting the entire file into one string, which is problematic to say the least.
void MakeRandomLayout(int rows, int cols)
{
string fiveByFive = "cubes25.txt";
string fourByFour = "cubes16.txt";
ifstream infile;
while (true) {
infile.open(fourByFour.c_str());
if (infile.fail()) {
infile.clear();
cout << "No such file found";
} else {
break;
}
}
Vector<string> cubes;
string cube;
while (std::getline(infile, cube)) {
cubes.add(cube);
}
}
Edits: Running OSX 10.7.
The infinite loop for the file is unfinished, will eventually ask for a file.
No luck with extended getline() version, tried that earlier.
Same system for dev and build/run.
The text file i'm reading in looks as follows:
AAEEGN
ABBJOO
ACHOPS
AFFKPS
AOOTTW
CIMOTU
DEILRX
DELRVY
DISTTY
EEGHNW
EEINSU
EHRTVW
EIOSST
ELRTTY
HIMNQU
HLNNRZ
Each string is on a new line in the file. The second one that I'm not reading in is the same but 25 lines instead of 16
Mac software recognizes either '\r' or '\n' as line-endings, for backward compatibility with Mac OS Classic. Make sure that your text editor hasn't put '\r' line endings in your file when your processing code is expecting '\n' (and verify that the '\n' characters you think are in the middle of the string aren't in fact '\r' instead.
I suspect that you are failing to display the contents of Vector correctly. When you dump the Vector, do you print a \n after each entry? You should, because getline discards the newlines on input.
FYI: the typical pattern for reading line-by-line is this:
Vector<string> cubes;
string cube;
while(std::getline(infile, cube)) {
cubes.add(cube);
}
Note that this will discard the newlines, but will put one line per entry in Vector.
EDIT: For whatever it is worth, if you were using an std::vector, you could slurp the file in thusly:
std::ifstream ifile(av[1]);
std::vector<std::string> v(
(std::istream_iterator<std::string>(ifile)),
std::istream_iterator<std::string>());

Very specific parsing in C++

Basically, I'm trying to read in the words from a file and, without punctuation, read each word into a multimap which is then inserted into a vector with each pair being a word and the line of the file that word is found. I've got the function to remove punctuation working perfectly and I'm fairly certain my insert code works properly, but I can't seem to get around the line number part. I've included this section of my code as follows:
ifstream in("textfile.txt");
string line;
string keys;
stringstream keystream;
int line_number = 1;
while (getline(in, line, '\n')) {
alphanum(line);
keystream << line;
while(getline(keystream, keys, ' '))
table.insert(keys, line_number); //this just inserts the pair into my vector (table is an instance of a class I created)
keystream.str("");
line_number++;
}
The problem seems to be related to the stringstream. It doesn't seem to clear when I use keystream.str(""). This particular method only seems to read line 1 in and then exits the loop, whereas some other variations I've tried (I can't remember exactly what I did) read the entire file but don't flush the stringstream so it reads like word 1, word 1, word 2, word 1, word 2, word 3, etc.. Anyway, if anyone could point me in the right direction or perhaps link to a guide specific to parsing input in c++ that would be greatly appreciated! Thanks!
Don't keep the string stream object; just make a new one in each round:
string line;
while (getline(in, line, '\n'))
{
alphanum(line);
istringstream keystream(line);
string keys;
while (getline(keystream, keys, ' ')) // or even "while (keystream >> keys)"
{
}
}
I think the problem is that the second getline() loop sets the EOF flag on the stringstream, and this is not cleared when you call str(). You need to call .clear() also on 'keystream'.

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.