Converting a text file into std::vector<string> - c++

I want a vector document variable that will look like
document[0] = "I"
document[1] = " "
document[2] = "want"
document[3] = " "
document[4] = "cake"
document[5] = "."
document[6] = "\n"
With the given line in the file "I want cake.\n"
I'm not sure how to go about doing this and everything I found on delimiters will get rid of whitespace or something.
I have an unordered_set of stopwords that I want to remove from a file. The method I have set up will iterate over a vector and remove_if the word is in my stop words.
The goal is to put all the elements in the document vector into a new file without the stop words.
std::vector<string> MakeFileVector(string filename){
//Get the input from the file
std::ifstream input(filename.c_str());
std::vector<string> doc;
string line;
//For each line in the text File
for ( line ; getline( input, line );)
{
//somehow split up each word/space/period/comma/newline char
//and add to the doc vector
//for each word/space/period/comma/newline char
doc.push_back(str)
}
return doc;
}

#include <algorithm>
#include <iterator>
#include <vector>
#include <string>
ifstream myfile("textline.txt");
std::vector<std::string> myLines;
std::copy(std::istream_iterator<std::string>(myfile),
std::istream_iterator<std::string>(),
std::back_inserter(myLines));
Here you go!

You can use std::noskipws found here. This will make sure that whitespaces are not skipped when reading from the stream. Alternatively, you can also use std::getline, found here to get the line into your std::string and then do your processing of whitespaces.

Related

C++ file conversion: pipe delimited to comma delimited

I am trying to figure out how to turn this input file that is in pipe delimited form into comma delimited. I have to open the file, read it into an array, convert it into comma delimited in an output CSV file and then close all files. I have been told that the easiest way to do is within excel but I am not quite sure how.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream inFile;
string myArray[5];
cout << "Enter the input filename:";
cin >> inFileName;
inFile.open(inFileName);
if(inFile.is_open())
std::cout<<"File Opened"<<std::endl;
// read file line by line into array
cout<<"Read";
for(int i = 0; i < 5; ++i)
{
file >> myArray[i];
}
// File conversion
// close input file
inFile.close();
// close output file
outFile.close();
...
What I need to convert is:
Miles per hour|6,445|being the "second" team |5.54|9.98|6,555.00
"Ending" game| left at "beginning"|Elizabeth, New Jersey|25.25|6.78|987.01
|End at night, or during the day|"Let's go"|65,978.21|0.00|123.45
Left-base night|10/07/1900|||4.07|777.23
"Let's start it"|Start Baseball Game|Starting the new game to win
What the output should look like in comma-delimited form:
Miles per hour,"6,445","being the ""second"" team member",5.54,9.98,"6,555.00",
"""Ending"" game","left at ""beginning""","Denver, Colorado",25.25,6.78,987.01,
,"End at night, during the day","""Let's go""","65,978.21",0.00,123.45,
Left-base night, 10/07/1900,,,4.07,777.23,
"""Let's start it""", Start Baseball Game, Starting the new game to win,
I will show you a complete solution and explain it to you. But let's first have view on it:
#include <iostream>
#include <vector>
#include <fstream>
#include <regex>
#include <string>
#include <algorithm>
// I omit in the example here the manual input of the filenames. This exercise can be done by somebody else
// Use fixed filenames in this example.
const std::string inputFileName("r:\\input.txt");
const std::string outputFileName("r:\\output.txt");
// The delimiter for the source csv file
std::regex re{ R"(\|)" };
std::string addQuotes(const std::string& s) {
// if there are single quotes in the string, then replace them with double quotes
std::string result = std::regex_replace(s, std::regex(R"(")"), R"("")");
// If there is any quote (") or comma in the file, then quote the complete string
if (std::any_of(result.begin(), result.end(), [](const char c) { return ((c == '\"') || (c == ',')); })) {
result = "\"" + result + "\"";
}
return result;
}
// Some output function
void printData(std::vector<std::vector<std::string>>& v, std::ostream& os) {
// Go throug all rows
std::for_each(v.begin(), v.end(), [&os](const std::vector<std::string>& vs) {
// Define delimiter
std::string delimiter{ "" };
// Show the delimited strings
for (const std::string& s : vs) {
os << delimiter << s;
delimiter = ",";
}
os << "\n";
});
}
int main() {
// We first open the ouput file, becuse, if this cannot be opened, then no meaning to do the rest of the exercise
// Open output file and check, if it could be opened
if (std::ofstream outputFileStream(outputFileName); outputFileStream) {
// Open the input file and check, if it could be opened
if (std::ifstream inputFileStream(inputFileName); inputFileStream) {
// In this variable we will store all lines from the CSV file including the splitted up columns
std::vector<std::vector<std::string>> data{};
// Now read all lines of the CSV file and split it into tokens
for (std::string line{}; std::getline(inputFileStream, line); ) {
// Split line into tokens and add to our resulting data vector
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {}));
}
std::for_each(data.begin(), data.end(), [](std::vector<std::string>& vs) {
std::transform(vs.begin(), vs.end(), vs.begin(), addQuotes);
});
// Output, to file
printData(data, outputFileStream);
// And to the screen
printData(data, std::cout);
}
else {
std::cerr << "\n*** Error: could not open input file '" << inputFileName << "'\n";
}
}
else {
std::cerr << "\n*** Error: could not open output file '" << outputFileName << "'\n";
}
return 0;
}
So, then let's have a look. We have function
main, read csv files, split it into tokens, convert it, and write it
addQuotes. Add quote if necessary
printData print he converted data to an output stream
Let's start with main. main will first open the input file and the output file.
The input file contains a kind of structured data and is also called csv (comma separted values). But here we do not have a comma, but a pipe symbol as delimter.
And the result will be typically stored in a 2d-vector. In dimension 1 is the rows and the other dimension is for the columns.
So, what do we need to do next? As we can see, we need to read first all complete text lines form the source stream. This can be easily done with a one-liner:
for (std::string line{}; std::getline(inputFileStream, line); ) {
As you can see here, the for statement has an declaration/initialization part, then a condition, and then a statement, carried out at the end of the loop. This is well known.
We first define a variable "line" of type std::string and use the default initializer to create an empty string. Then we use std::getline to read from the stream a complete line and put it into our variable. The std::getline returns a reference to sthe stream, and the stream has an overloaded bool operator, where it returns, if there was a failure (or end of file). So, the for loop does not need an additional check for the end of file. And we do not use the last statement of the for loop, because by reading a line, the file pointer is advanced automatically.
This gives us a very simple for loop, fo reading a complete file line by line.
Please note: Defining the variable "line" in the for loop, will scope it to the for loop. Meaning, it is only visible in the for loop. This is generally a good solution to prevent the pollution of the outer name space.
OK, now the next line:
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), digit), {}));
Uh Oh, what is that?
OK, lets go step by step. First, we obviously want to add someting to our 2-dimensionsal data vector. We will use the std::vectors function emplace_back. We could have used also used push_back, but this would mean that we need to do unnecessary copying of data. Hence, we selected emplace_back to do an in place construction of the thing that we want to add to our 2-dimensionsal data vector.
And what do we want to add? We want to add a complete row, so a vector of columns. In our case a std::vector<std::string>. And, becuase we want to do in inplace construction of this vector, we call it with the vectors range constructor. Please see here: Constructor number 5. The range constructor takes 2 iterators, a begin and an end iterator, as parameter, and copies all values pointed to by the iterators into the vector.
So, we expect a begin and an end iterator. And what do we see here:
The begin iterator is: std::sregex_token_iterator(line.begin(), line.end(), digit)
And the end iterator is simply {}
But what is this thing, the sregex_token_iterator?
This is an iterator that iterates over patterns in a line. And the pattern is given by a regex. You may read here about the C++ regex libraray. Since it is very powerful, you unfortunately need to learn about it a little longer. And I cannot cover it here. But let us describe its basic functionality for our purpose: You can describe a pattern in some kind of meta language, and the
std::sregex_token_iterator will look for that pattern, and, if it finds a match, return the related data. In our case the pattern is very simple: Digits. This can be desribed with "\d+" and means, try to match one or more digits.
Now to the {} as the end iterator. You may have read that the {} will do default construction/initialization. And if you read here, number 1, then you see that the "default-constructor" constructs an end-of-sequence iterator. So, exactly what we need.
After we have read all data, we will transform the single strings, to the required output. This will be done with std::transform and the function addQuotes.
The strategy here is to first replace the single quotes with double quotes.
And then, next, we look, if there is any comma or quote in the string, then we enclose the whole string additionally in quotes.
And last, but not least, we have a simple output function and print the converted data into a file and on the screen.

C++ std::getline result string won't let me concatenate another string to it

I am attempting to read data from a .txt file which contains nothing but a list of names. I want to do the following for each name:
1) read a name and store it in a string variable.
2) Add quotes to the name ("name")
3) make a map entry using each name (map["name"]= x)
I am using the std::getline function to read each line and I'm trying to add the quotes simply by using ( string name="\""+line+"\"" ).
The problem is that every time I add something to the end of the line string, nothing is added!
This is my code:
#include <iostream>
#include <string>
#include <map>
#include <fstream>
#include <stdlib.h>
using namespace std;
int main(){
ifstream reader("input.txt");
string line;
string name;
map<string,int> arr;
int np=5;
for(int i=0;i<np;i++){
getline(reader,line);
name="\"" +line +"\"";
cout<< name << endl;
}
return 0;
}
This is my input txt file:
dave
laura
owen
vick
amr
this is the output I'm currently getting:
"dave
"laura
"owen
"vick
"amr"
Thank you very much!
I suppose your input lines ends with \r\n, while your getline reads until '\n'. If that is true then solution is to remove manually \r char at the end of line:
getline(reader,line);
line.pop_back();
[edit]
or instead of pop_back():
auto cr_pos = line.rfind('\r');
if ( cr_pos != std::string::npos )
line = line.substr(0, cr_pos);

how to replace a line with another/ c++ code

I am working on ubuntu. I have a file called test.txt. I would like to replace the second line with another line. How can I do that? I don't want to create a new file and delete the first one.
I would like to specify that the lenght of the new line is the same with the length of the ond one
Try something like:
#include <fstream>
#include <string>
int main() {
const int lineToReplace = 14;
std::fstream file("myfile.txt", std::ios::in | std::ios::out);
char line[255];
for (int i=0; i<lineToReplace-1; ++i)
file.getline(line, 255); // This already skips the newline
file.tellg();
file << "Your contents here" << std::endl;
file.close();
return 0;
}
Note that line can hold up to 254 bytes (plus the null terminator), so if your line takes more than that, adjust accordingly.
If the file is small enough you can read it into memory, do whatever modifications you want on the in-memory copy, and the write if back out.
Edit Code as requested:
// A vector to store all lines
std::vector<std::string> lines;
// The input file
std::ifstream is("test.txt")
// Get all lines into the vector
std::string line;
while (std::getline(is, line))
lines.push_back(line);
// Close the input file
is.close();
// All of the file is now in memory, each line a single entry in the vector
// "lines". The items in the vector can now be modified as you please.
// Replace the second line with something else
lines[1] = "Something else";
// Open output file
std::ofstream os("test.txt");
// Write all lines to the file
for(const auto& l : lines)
os << l << '\n';
// All done, close output file
os.close();
This is Python, but it's significantly more readable and terse for this purpose:
f = open('text.txt', 'w+') # open for read/write
g = tempfile.TemporaryFile('w+') # temp file to build replacement data
g.write(next(f)) # add the first line
next(f) # discard the second line
g.write(second_line) # add this one instead
g.writelines(f) # copy the rest of the file
f.seek(0) # go back to the start
g.seek(0) # start of the tempfile
f.writelines(g) # copy the file data over
f.truncate() # drop the rest of the file
You could also use shutil.copyfileobj instead of writelines to do block copying between the files.
Here's how I would do it, without a hard limit on the line length:
#include <fstream>
#include <string>
using namespace std;
int main()
{
fstream file("test.txt",std::ios::in|std::ios::out);
string line;
string line_new="LINE2";
// Skip the first line, file pointer points to beginning of second line now.
getline(file,line);
fstream::pos_type pos=file.tellg();
// Read the second line.
getline(file,line);
if (line.length()==line_new.length()) {
// Go back to start of second line and replace it.
file.seekp(pos);
file << line_new;
}
return 0;
}

C++: How to iterate over a text in a std::string line by line with STL?

I have a text in a std::string object. The text consists of several lines. I want to iterate over the text line by line using STL (or Boost). All solutions I come up with seem to be far from elegant. My best approach is to split the text at the line breaks. Is there a more elegant solution?
UPDATE: This is what I was looking for:
std::string input;
// get input ...
std::istringstream stream(input);
std::string line;
while (std::getline(stream, line)) {
std::cout << line << std::endl;
}
Why do you keep the text in your source file? Keep it in a separate text file. Open it with std::ifstream and iterate over it with while(getline(...))
#include <iostream>
#include <fstream>
int main()
{
std::ifstream fin("MyText.txt");
std::string file_line;
while(std::getline(fin, file_line))
{
//current line of text is in file_line, not including the \n
}
}
Alternatively, if the text HAS to be in a std::string variable read line by line using std::istringstream in a similar manner
If your question is how to put the text lexially into your code without using +, please note that adjacent string literals are concatenated before compilation, so you could do this:
std::string text =
"Line 1 contents\n"
"Line 2 contents\n"
"Line 3 contents\n";
Use Boost.Tokenizer:
std::string text("foo\n\nbar\nbaz");
typedef boost::tokenizer<boost::char_separator<char> > line_tokenizer;
line_tokenizer tok(text, boost::char_separator<char>("\n\r"));
for (line_tokenizer::const_iterator i = tok.begin(), end = tok.end();
i != end ; ++i)
std::cout << *i << std::endl;
prints
foo
bar
baz
Note that it skips over empty lines, which may or may not be what you want.
If you want to loop line by line, as you say, why would splitting the text at line breaks not be exactly what you want?
You didn't post code showing how you're doing it, but your approach seems correct to accomplish what you said you wanted. Why does it feel inferior?

C++ length of file and vectors

Hi I have a file with some text in it. Is there some easy way to get the number of lines in the file without traversing through the file?
I also need to put the lines of the file into a vector. I am new to C++ but I think vector is like ArrayList in java so I wanted to use a vector and insert things into it. So how would I do it?
Thanks.
There is no way of finding the number of lines in a file without reading it. To read all lines:
1) create a std::vector of std::string
3 ) open a file for input
3) read a line as a std::string using getline()
4) if the read failed, stop
5) push the line into the vector
6) goto 3
You would need to traverse the file to detect the number of lines (or at least call a library method that traverse the file).
Here is a sample code for parsing text file, assuming that you pass the file name as an argument, by using the getline method:
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
int main(int argc, char* argv[])
{
std::vector<std::string> lines;
std::string line;
lines.clear();
// open the desired file for reading
std::ifstream infile (argv[1], std::ios_base::in);
// read each file individually (watch out for Windows new lines)
while (getline(infile, line, '\n'))
{
// add line to vector
lines.push_back (line);
}
// do anything you like with the vector. Output the size for example:
std::cout << "Read " << lines.size() << " lines.\n";
return 0;
}
Update: The code could fail for many reasons (e.g. file not found, concurrent modifications to file, permission issues, etc). I'm leaving that as an exercise to the user.
1) No way to find number of lines without reading the file.
2) Take a look at getline function from the C++ Standard Library. Something like:
string line;
fstream file;
vector <string> vec;
...
while (getline(file, line)) vec.push_back(line);
Traversing the file is fundamentally required to determine the number of lines, regardless of whether you do it or some library routine does it. New lines are just another character, and the file must be scanned one character at a time in its entirety to count them.
Since you have to read the lines into a vector anyways, you might as well combine the two steps:
// Read lines from input stream in into vector out
// Return the number of lines read
int getlines(std::vector<std::string>& out, std::istream& in == std::cin) {
out.clear(); // remove any data in vector
std::string buffer;
while (std::getline(in, buffer))
out.push_back(buffer);
// return number of lines read
return out.size();
}