C++ length of file and vectors - c++

Hi I have a file with some text in it. Is there some easy way to get the number of lines in the file without traversing through the file?
I also need to put the lines of the file into a vector. I am new to C++ but I think vector is like ArrayList in java so I wanted to use a vector and insert things into it. So how would I do it?
Thanks.

There is no way of finding the number of lines in a file without reading it. To read all lines:
1) create a std::vector of std::string
3 ) open a file for input
3) read a line as a std::string using getline()
4) if the read failed, stop
5) push the line into the vector
6) goto 3

You would need to traverse the file to detect the number of lines (or at least call a library method that traverse the file).
Here is a sample code for parsing text file, assuming that you pass the file name as an argument, by using the getline method:
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
int main(int argc, char* argv[])
{
std::vector<std::string> lines;
std::string line;
lines.clear();
// open the desired file for reading
std::ifstream infile (argv[1], std::ios_base::in);
// read each file individually (watch out for Windows new lines)
while (getline(infile, line, '\n'))
{
// add line to vector
lines.push_back (line);
}
// do anything you like with the vector. Output the size for example:
std::cout << "Read " << lines.size() << " lines.\n";
return 0;
}
Update: The code could fail for many reasons (e.g. file not found, concurrent modifications to file, permission issues, etc). I'm leaving that as an exercise to the user.

1) No way to find number of lines without reading the file.
2) Take a look at getline function from the C++ Standard Library. Something like:
string line;
fstream file;
vector <string> vec;
...
while (getline(file, line)) vec.push_back(line);

Traversing the file is fundamentally required to determine the number of lines, regardless of whether you do it or some library routine does it. New lines are just another character, and the file must be scanned one character at a time in its entirety to count them.
Since you have to read the lines into a vector anyways, you might as well combine the two steps:
// Read lines from input stream in into vector out
// Return the number of lines read
int getlines(std::vector<std::string>& out, std::istream& in == std::cin) {
out.clear(); // remove any data in vector
std::string buffer;
while (std::getline(in, buffer))
out.push_back(buffer);
// return number of lines read
return out.size();
}

Related

C++ file conversion: pipe delimited to comma delimited

I am trying to figure out how to turn this input file that is in pipe delimited form into comma delimited. I have to open the file, read it into an array, convert it into comma delimited in an output CSV file and then close all files. I have been told that the easiest way to do is within excel but I am not quite sure how.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream inFile;
string myArray[5];
cout << "Enter the input filename:";
cin >> inFileName;
inFile.open(inFileName);
if(inFile.is_open())
std::cout<<"File Opened"<<std::endl;
// read file line by line into array
cout<<"Read";
for(int i = 0; i < 5; ++i)
{
file >> myArray[i];
}
// File conversion
// close input file
inFile.close();
// close output file
outFile.close();
...
What I need to convert is:
Miles per hour|6,445|being the "second" team |5.54|9.98|6,555.00
"Ending" game| left at "beginning"|Elizabeth, New Jersey|25.25|6.78|987.01
|End at night, or during the day|"Let's go"|65,978.21|0.00|123.45
Left-base night|10/07/1900|||4.07|777.23
"Let's start it"|Start Baseball Game|Starting the new game to win
What the output should look like in comma-delimited form:
Miles per hour,"6,445","being the ""second"" team member",5.54,9.98,"6,555.00",
"""Ending"" game","left at ""beginning""","Denver, Colorado",25.25,6.78,987.01,
,"End at night, during the day","""Let's go""","65,978.21",0.00,123.45,
Left-base night, 10/07/1900,,,4.07,777.23,
"""Let's start it""", Start Baseball Game, Starting the new game to win,
I will show you a complete solution and explain it to you. But let's first have view on it:
#include <iostream>
#include <vector>
#include <fstream>
#include <regex>
#include <string>
#include <algorithm>
// I omit in the example here the manual input of the filenames. This exercise can be done by somebody else
// Use fixed filenames in this example.
const std::string inputFileName("r:\\input.txt");
const std::string outputFileName("r:\\output.txt");
// The delimiter for the source csv file
std::regex re{ R"(\|)" };
std::string addQuotes(const std::string& s) {
// if there are single quotes in the string, then replace them with double quotes
std::string result = std::regex_replace(s, std::regex(R"(")"), R"("")");
// If there is any quote (") or comma in the file, then quote the complete string
if (std::any_of(result.begin(), result.end(), [](const char c) { return ((c == '\"') || (c == ',')); })) {
result = "\"" + result + "\"";
}
return result;
}
// Some output function
void printData(std::vector<std::vector<std::string>>& v, std::ostream& os) {
// Go throug all rows
std::for_each(v.begin(), v.end(), [&os](const std::vector<std::string>& vs) {
// Define delimiter
std::string delimiter{ "" };
// Show the delimited strings
for (const std::string& s : vs) {
os << delimiter << s;
delimiter = ",";
}
os << "\n";
});
}
int main() {
// We first open the ouput file, becuse, if this cannot be opened, then no meaning to do the rest of the exercise
// Open output file and check, if it could be opened
if (std::ofstream outputFileStream(outputFileName); outputFileStream) {
// Open the input file and check, if it could be opened
if (std::ifstream inputFileStream(inputFileName); inputFileStream) {
// In this variable we will store all lines from the CSV file including the splitted up columns
std::vector<std::vector<std::string>> data{};
// Now read all lines of the CSV file and split it into tokens
for (std::string line{}; std::getline(inputFileStream, line); ) {
// Split line into tokens and add to our resulting data vector
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {}));
}
std::for_each(data.begin(), data.end(), [](std::vector<std::string>& vs) {
std::transform(vs.begin(), vs.end(), vs.begin(), addQuotes);
});
// Output, to file
printData(data, outputFileStream);
// And to the screen
printData(data, std::cout);
}
else {
std::cerr << "\n*** Error: could not open input file '" << inputFileName << "'\n";
}
}
else {
std::cerr << "\n*** Error: could not open output file '" << outputFileName << "'\n";
}
return 0;
}
So, then let's have a look. We have function
main, read csv files, split it into tokens, convert it, and write it
addQuotes. Add quote if necessary
printData print he converted data to an output stream
Let's start with main. main will first open the input file and the output file.
The input file contains a kind of structured data and is also called csv (comma separted values). But here we do not have a comma, but a pipe symbol as delimter.
And the result will be typically stored in a 2d-vector. In dimension 1 is the rows and the other dimension is for the columns.
So, what do we need to do next? As we can see, we need to read first all complete text lines form the source stream. This can be easily done with a one-liner:
for (std::string line{}; std::getline(inputFileStream, line); ) {
As you can see here, the for statement has an declaration/initialization part, then a condition, and then a statement, carried out at the end of the loop. This is well known.
We first define a variable "line" of type std::string and use the default initializer to create an empty string. Then we use std::getline to read from the stream a complete line and put it into our variable. The std::getline returns a reference to sthe stream, and the stream has an overloaded bool operator, where it returns, if there was a failure (or end of file). So, the for loop does not need an additional check for the end of file. And we do not use the last statement of the for loop, because by reading a line, the file pointer is advanced automatically.
This gives us a very simple for loop, fo reading a complete file line by line.
Please note: Defining the variable "line" in the for loop, will scope it to the for loop. Meaning, it is only visible in the for loop. This is generally a good solution to prevent the pollution of the outer name space.
OK, now the next line:
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), digit), {}));
Uh Oh, what is that?
OK, lets go step by step. First, we obviously want to add someting to our 2-dimensionsal data vector. We will use the std::vectors function emplace_back. We could have used also used push_back, but this would mean that we need to do unnecessary copying of data. Hence, we selected emplace_back to do an in place construction of the thing that we want to add to our 2-dimensionsal data vector.
And what do we want to add? We want to add a complete row, so a vector of columns. In our case a std::vector<std::string>. And, becuase we want to do in inplace construction of this vector, we call it with the vectors range constructor. Please see here: Constructor number 5. The range constructor takes 2 iterators, a begin and an end iterator, as parameter, and copies all values pointed to by the iterators into the vector.
So, we expect a begin and an end iterator. And what do we see here:
The begin iterator is: std::sregex_token_iterator(line.begin(), line.end(), digit)
And the end iterator is simply {}
But what is this thing, the sregex_token_iterator?
This is an iterator that iterates over patterns in a line. And the pattern is given by a regex. You may read here about the C++ regex libraray. Since it is very powerful, you unfortunately need to learn about it a little longer. And I cannot cover it here. But let us describe its basic functionality for our purpose: You can describe a pattern in some kind of meta language, and the
std::sregex_token_iterator will look for that pattern, and, if it finds a match, return the related data. In our case the pattern is very simple: Digits. This can be desribed with "\d+" and means, try to match one or more digits.
Now to the {} as the end iterator. You may have read that the {} will do default construction/initialization. And if you read here, number 1, then you see that the "default-constructor" constructs an end-of-sequence iterator. So, exactly what we need.
After we have read all data, we will transform the single strings, to the required output. This will be done with std::transform and the function addQuotes.
The strategy here is to first replace the single quotes with double quotes.
And then, next, we look, if there is any comma or quote in the string, then we enclose the whole string additionally in quotes.
And last, but not least, we have a simple output function and print the converted data into a file and on the screen.

Reading a specific line from a .txt file

I have a text file full of names:
smartgem
marshbraid
seamore
stagstriker
meadowbreath
hydrabrow
startrack
wheatrage
caskreaver
seaash
I want to code a random name generator that will copy a specific line from the.txt file and return it.
While reading in from a file you must start from the beginning and continue on. My best advice would be to read in all of the names, store them in a set, and randomly access them that way if you don't have stringent concerns over efficiency.
You cannot pick a random string from the end of the file without first reading up that name in the file.
You may also want to look at fseek() which will allow you to "jump" to a location within the input stream. You could randomly generate an offset and then provide that as an argument to fseek().
http://www.cplusplus.com/reference/cstdio/fseek/
You cannot do that unless you do one of two things:
Generate an index for that file, containing the address of each line, then you can go straight to that address and read it. This index can be stored in many different ways, the easiest one being on a separate file, this way the original file can still be considered a text file, or;
Structure the file so that each line starts at a fixed distance in bytes of each other, so you can just go to the line you want by multiplying (desired index * size). This does not mean the texts on each line need to have the same length, you can pad the end of the line with null-terminators (character '\0'). In this case it is not recommended to work this file as a text file anymore, but a binary file instead.
You can write a separate program that will generate this index or generate the structured file for your main program to use.
All this of course, considering you want the program to run and read the line without having to load the entire file in memory first. If your program will constantly read lines from the file, you should probably just load the entire file into a std::vector<std::string> and then read the lines at will from there.
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstdlib>
#include <ctime>
using namespace std;
int main()
{
string filePath = "test.txt";
vector<std::string> qNames;
ifstream openFile(filePath.data());
if (openFile.is_open())
{
string line;
while (getline(openFile, line))
{
qNames.push_back(line.c_str());
}
openFile.close();
}
if (!qNames.empty())
{
srand((unsigned int)time(NULL));
for (int i = 0; i < 10; i++)
{
int num = rand();
int linePos = num % qNames.size();
cout << qNames.at(linePos).c_str() << endl;
}
}
return 0;
}

How do I deal with a carriage return line feed when trying to read in file

So I am working on a file that I need to read in which contains both commas separating words and carriage return linefeed at the end of each line and I can't figure out a way to handle it. I am trying to read in each word before the comma and put it into the a vector until it hits the carriage return line feed but I am having problems.
Here is my text file (as seen on notepad++ so you can see the symbols. on the actual text, the things inside [] don't appear)
microwave,lamp,guitar,couch,bed,dog,cat[cr][lf]
P1:microwave,couch,bed,dog,chair,bookcase,fish[cr][lf]
I have tried multiple solutions, but nothing seems to work. Here is what I have tried so far. but it obviously isn't working. I have seen some users suggest using substring to somehow read out the comma, and read in the words but I am not sure how to do that. I couldn't find a good tutorial or example of one. In my head, I have the algorithm(or at least, steps on how to go about it), but i am not sure how to go about implementing it.
Import file (istream)
Read until comma, take string and place it in vector1 (getline, input, ,), vector.push_back(input)
Repeat previous step until you reach \cr\lf stop reading. (getline(input, '/r'))
move on to the next line
Read until comma, take string and place it in vector2
Repeat
Read the line until /cr/lf
Here is the code I put in practice using part of the above steps i made.
string input;
vector<string> v1;
vector<string> v2;
ifstream infile;
infile.open("example.txt");
while(getline(infile, input)) //read until end of line
{
while(getline(infile, input, '\r')) //read until it reaches a carriage return
{
while(getline(infile, input, ',')) // read until it reaches a comma
{
v1.push_back(input); //take the word and put in vector.
}
}
}
infile.close();
Any help would be appreciated.
Edit: I forgot to mention. When I used this code, it seemed to not import anything into the vectors. I am sure all the words got lost somewhere in the getline functions, but I don't know how to just read up to comma and carriage return line feed without using it.
You should use getline() to get a whole line first. It should handle carriage returns for you. Then, put the result into a stringstream and use getline() on it to separate the line at the commas.
My code that reads input into a vector of vectors:
#include <fstream>
#include <iostream>
#include <sstream>
#include <vector>
int main()
{
std::ifstream fin("input.txt");
std::vector<std::vector<std::string>> result;
for(std::string line; std::getline(fin, line);)
{
result.emplace_back();
std::stringstream ss(line);
for(std::string word; std::getline(ss, word, ',');)
{
result.back().push_back(word);
}
}
for(const auto &i : result)
{
for(const auto &j : i)
{
std::cout << j << ' ';
}
std::cout << '\n';
}
}
You can modify it to read into two vectors by just removing the outer loop and use two separate loops for each of the two vectors/lines.
In your code, you first have a loop that reads line by line until the end of the file. After you read a line, you have a loop that reads until a '\r', which as far as I know does not occur in a normal text file. Even if there are '\r's in the file, you would be overwriting what you just read in from the outer loop. Same thing with the loop inside that.
Were you taught that while(getline(fin, str)) reads from a file without knowing how it works?

Replace line in txt file c++

I just wondering cause i have a text file containing STATUS:USERID:PASSWORD in accounts.txt
example it would look like this:
OPEN:bob:askmehere:
OPEN:john:askmethere:
LOCK:rob:robmypurse:
i have a user input in my main as such user can login 3x else status will change from OPEN to LOCK
example after 3 tries of john
before:
OPEN:bob:askmehere:
OPEN:john:askmethere:
LOCK:rob:robmypurse:
after:
OPEN:bob:askmehere:
LOCK:john:askmethere:
LOCK:rob:robmypurse:
what i have done is:
void lockUser(Accounts& in){
// Accounts class consist 3 attributes (string userid, string pass, status)
ofstream oFile;
fstream iFile;
string openFile="accounts.txt";
string status, userid, garbage;
Accounts toupdate;
oFile.open(openFile);
iFile.open(openFile);
while(!iFile.eof()){
getline(iFile, status, ':');
getline(iFile, userid, ':');
getline(iFile, garbage, '\n');
if(userid == in.getUserId()){
toupdate.setUserId(in.getuserId());
toupdate.setPassword(in.getPassword());
toupdate.setStatus("LOCK");
break;
}
//here i should update the account.txt how do i do that?
ofile.open(openFile);
ofile<<toupdate.getStatus()<<":"<<toupdate.getUserId()":"<<toupdate.getPassword()<<":"<<endl;
}
There are two common ways to replace or otherwise modify a file. The first and the "classic" way is to read the file, line by line, check for the line(s) that needs to be modified, and write to a temporary file. When you reach the end of the input file you close it, and rename the temporary file as the input file.
The other common way is when the file is relatively small, or you have a lot of memory, is to read it all into memory, do the modification needed, and then write out the contents of the memory to the file. How to store it in memory can be different, like a vector containing lines from the file, or a vector (or other buffer) containing all characters from the file without separation.
Your implementation is flawed because you open the output file (which is the same as the input file) inside the loop. The first problem with this is that the operating system may not allow you to open a file for writing if you already have it open for reading, and as you don't check for failure from opening the files you will not know about this. Another problem is if the operating system allows it, then your call to open will truncate the existing file, causing you to loose all but the very first line.
Simple pseudo-ish code to explain
std::ifstream input_file("your_file");
std::vector<std::string> lines;
std::string input;
while (std::getline(input_file, input))
lines.push_back(input);
for (auto& line : lines)
{
if (line_needs_to_be_modified(line))
modify_line_as_needed(line);
}
input_file.close();
std::ofstream output_file("your_file");
for (auto const& line : lines)
output_file << line << '\n';
Use ReadLine and find the line you wanna replace, and use replace to replace the thing you wanna replace. For example write:
string Example = "Text to find";
openFile="C:\\accounts.txt"; // the path of the file
ReadFile(openFile, Example);
OR
#include <fstream>
#include <iostream>
#include <string>
int main() {
ifstream openFile;
string ExampleText = BOB;
openFile("accounts.txt");
openFile >> ExampleText;
openFile.replace(Example, "Hello");
}

Efficiently read CSV file with optional columns

I'm trying to write a program that reads in a CSV file (no need to worry about escaping anything, it's strictly formatted with no quotes) but any numeric item with a value of 0 is instead just left blank. So a normal line would look like:
12,string1,string2,3,,,string3,4.5
instead of
12,string1,string2,3,0,0,string3,4.5
I have some working code using vectors but it's way too slow.
int main(int argc, char** argv)
{
string filename("path\\to\\file.csv");
string outname("path\\to\\outfile.csv");
ifstream infile(filename.c_str());
if(!infile)
{
cerr << "Couldn't open file " << filename.c_str();
return 1;
}
vector<vector<string>> records;
string line;
while( getline(infile, line) )
{
vector<string> row;
string item;
istringstream ss(line);
while(getline(ss, item, ','))
{
row.push_back(item);
}
records.push_back(row);
}
return 0;
}
Is it possible to overload operator<< of ostream similar to How to use C++ to read in a .csv file and output in another form? when fields can be blank?
Would that improve the performance?
Or is there anything else I can do to get this to run faster?
Thanks
The time spent reading the string data from the file is greater than the time spent parsing it. You won't make significant time savings in the parsing of the string.
To make your program run faster, read bigger "chunks" into memory; get more data per read. Research on memory mapped files.
One alternative way to handle this to get better performance is to read the whole file into a buffer. Then go through the buffer and set pointers to where the values start, if you find a , or end of line put in a \0.
e.g. https://code.google.com/p/csv-routine/