C++ Read in file element by element, but executing functions every line - c++

I have a file that I need to read in. Each line of the file is exceedingly long, so I'd rather not read each line to a temporary string and then manipulate those strings (unless this isn't actually inefficient - I could be wrong). Each line of the file contains a string of triplets - two numbers and a complex number, separated by a colon (as opposed to a comma, which is used in the complex number). My current code goes something like this:
while (states.eof() == 0)
{
std::istringstream complexString;
getline(states, tmp_str, ':');
tmp_triplet.row() = stoi(tmp_str);
getline(states, tmp_str, ':');
tmp_triplet.col() = stoi(tmp_str);
getline(states, tmp_str, ':');
complexString.str (tmp_str);
complexString >> tmp_triplet.value();
// Then something useful done with the triplet before moving onto the next one
}
tmp_triplet is a variable that stores these three numbers. I want some way to run a function every line (specifically, the triplets in every line are pushed into a vector, and each line in the file denotes a different vector). I'm sure there's an easy way to go about this, but I just want a way to check whether the end of the line has been reached, and to run a function when this is the case.

When trying to plan stuff out, abstraction can be your best friend. If you break down what you want to do by abstract functionality, you can more easily decide what data types should be used and how different data types should be planned out, and often you can find some functions almost write themselves. And typically, your code will be more modular (almost by definition), which will make it easy to reuse, maintain, and adapt if future changes are needed.
For example, it sounds like you want to parse a file. So that should be a function.
To do that function, you want to read in the file lines then process the file lines. So you can make two functions, one for each of those actions, and just call the functions.
To read in file lines you just want to take a file stream, and return a collection of strings for each line.
To process file lines you want to take a collection of strings and for each one parse the string into a triplet value. So you can create a method that takes a string and breaks it into a triplet, and just use that method here.
To process a string you just need to take a string and assign the first part as the row, the second part as the column, and the third part as the value.
struct TripletValue
{
int Row;
int Col;
int Val;
};
std::vector<TripletValue> ParseFile(std::istream& inputStream)
{
std::vector<std::string> fileLines = ReadFileLines(inputStream);
std::vector<TripletValue> parsedValues = GetValuesFromData(fileLines);
return parsedValues;
}
std::vector<std::string> ReadFileLines(std::istream& inputStream)
{
std::vector<std::string> fileLines;
while (!inputStream.eof())
{
std::string fileLine;
getline(inputStream, fileLine);
fileLines.push_back(fileLine);
}
return fileLines;
}
std::vector<TripletValue> GetValuesFromData(std::vector<std::string> data)
{
std::vector<TripletValue> values;
for (int i = 0; i < data.size(); i++)
{
TripletValue parsedValue = ParseLine(data[i]);
values.push_back(parsedValue);
}
return values;
}
TripletValue ParseLine(std::string fileLine)
{
std::stringstream sstream;
sstream << fileLine;
TripletValue parsedValue;
std::string strValue;
sstream >> strValue;
parsedValue.Row = stoi(strValue);
sstream >> strValue;
parsedValue.Col = stoi(strValue);
sstream >> strValue;
parsedValue.Val = stoi(strValue);
return parsedValue;
}

Related

C++ file conversion: pipe delimited to comma delimited

I am trying to figure out how to turn this input file that is in pipe delimited form into comma delimited. I have to open the file, read it into an array, convert it into comma delimited in an output CSV file and then close all files. I have been told that the easiest way to do is within excel but I am not quite sure how.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream inFile;
string myArray[5];
cout << "Enter the input filename:";
cin >> inFileName;
inFile.open(inFileName);
if(inFile.is_open())
std::cout<<"File Opened"<<std::endl;
// read file line by line into array
cout<<"Read";
for(int i = 0; i < 5; ++i)
{
file >> myArray[i];
}
// File conversion
// close input file
inFile.close();
// close output file
outFile.close();
...
What I need to convert is:
Miles per hour|6,445|being the "second" team |5.54|9.98|6,555.00
"Ending" game| left at "beginning"|Elizabeth, New Jersey|25.25|6.78|987.01
|End at night, or during the day|"Let's go"|65,978.21|0.00|123.45
Left-base night|10/07/1900|||4.07|777.23
"Let's start it"|Start Baseball Game|Starting the new game to win
What the output should look like in comma-delimited form:
Miles per hour,"6,445","being the ""second"" team member",5.54,9.98,"6,555.00",
"""Ending"" game","left at ""beginning""","Denver, Colorado",25.25,6.78,987.01,
,"End at night, during the day","""Let's go""","65,978.21",0.00,123.45,
Left-base night, 10/07/1900,,,4.07,777.23,
"""Let's start it""", Start Baseball Game, Starting the new game to win,
I will show you a complete solution and explain it to you. But let's first have view on it:
#include <iostream>
#include <vector>
#include <fstream>
#include <regex>
#include <string>
#include <algorithm>
// I omit in the example here the manual input of the filenames. This exercise can be done by somebody else
// Use fixed filenames in this example.
const std::string inputFileName("r:\\input.txt");
const std::string outputFileName("r:\\output.txt");
// The delimiter for the source csv file
std::regex re{ R"(\|)" };
std::string addQuotes(const std::string& s) {
// if there are single quotes in the string, then replace them with double quotes
std::string result = std::regex_replace(s, std::regex(R"(")"), R"("")");
// If there is any quote (") or comma in the file, then quote the complete string
if (std::any_of(result.begin(), result.end(), [](const char c) { return ((c == '\"') || (c == ',')); })) {
result = "\"" + result + "\"";
}
return result;
}
// Some output function
void printData(std::vector<std::vector<std::string>>& v, std::ostream& os) {
// Go throug all rows
std::for_each(v.begin(), v.end(), [&os](const std::vector<std::string>& vs) {
// Define delimiter
std::string delimiter{ "" };
// Show the delimited strings
for (const std::string& s : vs) {
os << delimiter << s;
delimiter = ",";
}
os << "\n";
});
}
int main() {
// We first open the ouput file, becuse, if this cannot be opened, then no meaning to do the rest of the exercise
// Open output file and check, if it could be opened
if (std::ofstream outputFileStream(outputFileName); outputFileStream) {
// Open the input file and check, if it could be opened
if (std::ifstream inputFileStream(inputFileName); inputFileStream) {
// In this variable we will store all lines from the CSV file including the splitted up columns
std::vector<std::vector<std::string>> data{};
// Now read all lines of the CSV file and split it into tokens
for (std::string line{}; std::getline(inputFileStream, line); ) {
// Split line into tokens and add to our resulting data vector
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {}));
}
std::for_each(data.begin(), data.end(), [](std::vector<std::string>& vs) {
std::transform(vs.begin(), vs.end(), vs.begin(), addQuotes);
});
// Output, to file
printData(data, outputFileStream);
// And to the screen
printData(data, std::cout);
}
else {
std::cerr << "\n*** Error: could not open input file '" << inputFileName << "'\n";
}
}
else {
std::cerr << "\n*** Error: could not open output file '" << outputFileName << "'\n";
}
return 0;
}
So, then let's have a look. We have function
main, read csv files, split it into tokens, convert it, and write it
addQuotes. Add quote if necessary
printData print he converted data to an output stream
Let's start with main. main will first open the input file and the output file.
The input file contains a kind of structured data and is also called csv (comma separted values). But here we do not have a comma, but a pipe symbol as delimter.
And the result will be typically stored in a 2d-vector. In dimension 1 is the rows and the other dimension is for the columns.
So, what do we need to do next? As we can see, we need to read first all complete text lines form the source stream. This can be easily done with a one-liner:
for (std::string line{}; std::getline(inputFileStream, line); ) {
As you can see here, the for statement has an declaration/initialization part, then a condition, and then a statement, carried out at the end of the loop. This is well known.
We first define a variable "line" of type std::string and use the default initializer to create an empty string. Then we use std::getline to read from the stream a complete line and put it into our variable. The std::getline returns a reference to sthe stream, and the stream has an overloaded bool operator, where it returns, if there was a failure (or end of file). So, the for loop does not need an additional check for the end of file. And we do not use the last statement of the for loop, because by reading a line, the file pointer is advanced automatically.
This gives us a very simple for loop, fo reading a complete file line by line.
Please note: Defining the variable "line" in the for loop, will scope it to the for loop. Meaning, it is only visible in the for loop. This is generally a good solution to prevent the pollution of the outer name space.
OK, now the next line:
data.emplace_back(std::vector<std::string>(std::sregex_token_iterator(line.begin(), line.end(), digit), {}));
Uh Oh, what is that?
OK, lets go step by step. First, we obviously want to add someting to our 2-dimensionsal data vector. We will use the std::vectors function emplace_back. We could have used also used push_back, but this would mean that we need to do unnecessary copying of data. Hence, we selected emplace_back to do an in place construction of the thing that we want to add to our 2-dimensionsal data vector.
And what do we want to add? We want to add a complete row, so a vector of columns. In our case a std::vector<std::string>. And, becuase we want to do in inplace construction of this vector, we call it with the vectors range constructor. Please see here: Constructor number 5. The range constructor takes 2 iterators, a begin and an end iterator, as parameter, and copies all values pointed to by the iterators into the vector.
So, we expect a begin and an end iterator. And what do we see here:
The begin iterator is: std::sregex_token_iterator(line.begin(), line.end(), digit)
And the end iterator is simply {}
But what is this thing, the sregex_token_iterator?
This is an iterator that iterates over patterns in a line. And the pattern is given by a regex. You may read here about the C++ regex libraray. Since it is very powerful, you unfortunately need to learn about it a little longer. And I cannot cover it here. But let us describe its basic functionality for our purpose: You can describe a pattern in some kind of meta language, and the
std::sregex_token_iterator will look for that pattern, and, if it finds a match, return the related data. In our case the pattern is very simple: Digits. This can be desribed with "\d+" and means, try to match one or more digits.
Now to the {} as the end iterator. You may have read that the {} will do default construction/initialization. And if you read here, number 1, then you see that the "default-constructor" constructs an end-of-sequence iterator. So, exactly what we need.
After we have read all data, we will transform the single strings, to the required output. This will be done with std::transform and the function addQuotes.
The strategy here is to first replace the single quotes with double quotes.
And then, next, we look, if there is any comma or quote in the string, then we enclose the whole string additionally in quotes.
And last, but not least, we have a simple output function and print the converted data into a file and on the screen.

How to read a complex input with istream&, string& and getline in c++?

I am very new to C++, so I apologize if this isn't a good question but I really need help in understanding how to use istream.
There is a project I have to create where it takes several amounts of input that can be on one line or multiple and then pass it to a vector (this is only part of the project and I would like to try the rest on my own), for example if I were to input this...
>> aaa bb
>> ccccc
>> ddd fff eeeee
Makes a vector of strings with "aaa", "bb", "ccccc", "ddd", "fff", "eeeee"
The input can be a char or string and the program stops asking for input when the return key is hit.
I know getline() gets a line of input and I could probably use a while loop to try and get the input such as...(correct me if I'm wrong)
while(!string.empty())
getline(cin, string);
However, I don't truly understand istream and it doesn't help that my class has not gone over pointers so I don't know how to use istream& or string& and pass it into a vector. On the project description, it said to NOT use stringstream but use functionality from getline(istream&, string&). Can anyone give somewhat of a detailed explanation as to how to make a function using getline(istream&, string&) and then how to use it in the main function?
Any little bit helps!
You're on the right way already; solely, you'd have to pre-fill the string with some dummy to enter the while loop at all. More elegant:
std::string line;
do
{
std::getline(std::cin, line);
}
while(!line.empty());
This should already do the trick reading line by line (but possibly multiple words on one line!) and exiting, if the user enters an empty line (be aware that whitespace followed by newline won't be recognised as such!).
However, if anything on the stream goes wrong, you'll be trapped in an endless loop processing previous input again and again. So best check the stream state as well:
if(!std::getline(std::cin, line))
{
// this is some sample error handling - do whatever you consider appropriate...
std::cerr << "error reading from console" << std::endl;
return -1;
}
As there might be multiple words on a single line, you'd yet have to split them. There are several ways to do so, quite an easy one is using an std::istringstream – you'll discover that it ressembles to what you likely are used to using std::cin:
std::istringstream s(line);
std::string word;
while(s >> word)
{
// append to vector...
}
Be aware that using operator>> ignores leading whitespace and stops after first trailing one (or end of stream, if reached), so you don't have to deal with explicitly.
OK, you're not allowed to use std::stringstream (well, I used std::istringstream, but I suppose this little difference doesn't count, does it?). Changes matter a little, it gets more complex, on the other hand, we can decide ourselves what counts as words an what as separators... We might consider punctuation marks as separators just like whitespace, but allow digits to be part of words, so we'd accept e. g. ab.7c d as "ab", "7c", "d":
auto begin = line.begin();
auto end = begin;
while(end != line.end()) // iterate over each character
{
if(std::isalnum(static_cast<unsigned char>(*end)))
{
// we are inside a word; don't touch begin to remember where
// the word started
++end;
}
else
{
// non-alpha-numeric character!
if(end != begin)
{
// we discovered a word already
// (i. e. we did not move begin together with end)
words.emplace_back(begin, end);
// ('words' being your std::vector<std::string> to place the input into)
}
++end;
begin = end; // skip whatever we had already
}
}
// corner case: a line might end with a word NOT followed by whitespace
// this isn't covered within the loop, so we need to add another check:
if(end != begin)
{
words.emplace_back(begin, end);
}
It shouldn't be too difficult to adjust to different interpretations of what is a separator and what counts as word (e. g. std::isalpha(...) || *end == '_' to detect underscore as part of words, but digits not). There are quite a few helper functions you might find useful...
You could input the value of the first column, then call functions based on the value:
void Process_Value_1(std::istream& input, std::string& value);
void Process_Value_2(std::istream& input, std::string& value);
int main()
{
// ...
std::string first_value;
while (input_file >> first_value)
{
if (first_value == "aaa")
{
Process_Value_1(input_file, first_value);
}
else if (first_value = "ccc")
{
Process_Value_2(input_file, first_value);
}
//...
}
return 0;
}
A sample function could be:
void Process_Value_1(std::istream& input, std::string& value)
{
std::string b;
input >> b;
std::cout << value << "\t" << b << endl;
input.ignore(1000, '\n'); // Ignore until newline.
}
There are other methods to perform the process, such as using tables of function pointers and std::map.

Logic for reading rows and columns from a text file (textparser) C++

I'm really stuck with this problem I'm having for reading rows and columns from a text file. We're using text files that our prof gave us. I have the functionality running so when the user in puts "numrows (file)" the number of rows in that file prints out.
However, every time I enter the text files, it's giving me 19 for both. The first text file only has 4 rows and the other one has 7. I know my logic is wrong, but I have no idea how to fix it.
Here's what I have for the numrows function:
int numrows(string line) {
ifstream ifs;
int i;
int row = 0;
int array [10] = {0};
while (ifs.good()) {
while (getline(ifs, line)) {
istringstream stream(line);
row = 0;
while(stream >>i) {
array[row] = i;
row++;
}
}
}
}
and here's the numcols:
int numcols(string line) {
int col = 0;
int i;
int arrayA[10] = {0};
ifstream ifs;
while (ifs.good()) {
istringstream streamA(line);
col = 0;
while (streamA >>i){
arrayA[col] = i;
col++;
}
}
}
edit: #chris yes, I wasn't sure what value to return as well. Here's my main:
int main() {
string fname, line;
ifstream ifs;
cout << "---- Enter a file name : ";
while (getline(cin, fname)) { // Ctrl-Z/D to quit!
// tries to open the file whose name is in string fname
ifs.open(fname.c_str());
if(fname.substr(0,8)=="numrows ") {
line.clear();
for (int i = 8; i<fname.length(); i++) {
line = line+fname[i];
}
cout << numrows (line) << endl;
ifs.close();
}
}
return 0;
}
This problem can be more easily solved by opening the text file as an ifstream, and then using std::get to process your input.
You can try for comparison against '\n' as the end of line character, and implement a pair of counters, one for columns on a line, the other for lines.
If you have variable length columns, you might want to store the values of (numColumns in a line) in a std::vector<int>, using myVector.push_back(numColumns) or similar.
Both links are to the cplusplus.com/reference section, which can provide a large amount of information about C++ and the STL.
Edited-in overview of possible workflow
You want one program, which will take a filename, and an 'operation', in this case "numrows" or "numcols". As such, your first steps are to find out the filename, and operation.
Your current implementation of this (in your question, after editing) won't work. Using cin should however be fine. Place this earlier in your main(), before opening a file.
Use substr like you have, or alternatively, search for a space character. Assume that the input after this is your filename, and the input in the first section is your operation. Store these values.
After this, try to open your file. If the file opens successfully, continue. If it won't open, then complain to the user for a bad input, and go back to the beginning, and ask again.
Once you have your file successfully open, check which type of calculation you want to run. Counting a number of rows is fairly easy - you can go through the file one character at a time, and count the number that are equal to '\n', the line-end character. Some files might use carriage-returns, line-feeds, etc - these have different characters, but are both a) unlikely to be what you have and b) easily looked up!
A number of columns is more complicated, because your rows might not all have the same number of columns. If your input is 1 25 21 abs 3k, do you want the value to be 5? If so, you can count the number of space characters on the line and add one. If instead, you want a value of 14 (each character and each space), then just count the characters based on the number of times you call get() before reaching a '\n' character. The use of a vector as explained below to store these values might be of interest.
Having calculated these two values (or value and set of values), you can output based on the value of your 'operation' variable. For example,
if (storedOperationName == "numcols") {
cout<< "The number of values in each column is " << numColsVal << endl;
}
If you have a vector of column values, you could output all of them, using
for (int pos = 0; pos < numColsVal.size(); pos++) {
cout<< numColsVal[pos] << " ";
}
Following all of this, you can return a value from your main() of 0, or you can just end the program (C++ now considers no return value from main to a be a return of 0), or you can ask for another filename, and repeat until some other method is used to end the program.
Further details
std::get() with no arguments will return the next character of an ifstream, using the example code format
std::ifstream myFileStream;
myFileStream.open("myFileName.txt");
nextCharacter = myFileStream.get(); // You should, before this, implement a loop.
// A possible loop condition might include something like `while myFileStream.good()`
// See the linked page on std::get()
if (nextCharacter == '\n')
{ // You have a line break here }
You could use this type of structure, along with a pair of counters as described earlier, to count the number of characters on a line, and the number of lines before the EOF (end of file).
If you want to store the number of characters on a line, for each line, you could use
std::vector<int> charPerLine;
int numberOfCharactersOnThisLine = 0;
while (...)
{
numberOfCharactersOnThisLine = 0
// Other parts of the loop here, including a numberOfCharactersOnThisLine++; statement
if (endOfLineCondition)
{
charPerLine.push_back(numberOfCharactersOnThisLine); // This stores the value in the vector
}
}
You should #include <vector> and either specific std:: before, or use a using namespace std; statement near the top. People will advise against using namespaces like this, but it can be convenient (which is also a good reason to avoid it, sort of!)

Clarification required regarding Arrays, Vectors and Maps in usage of a C++ Application

I want to know the right algorithm and a container class for my application. I am trying to build one Client-Server communication system where the Server contains group of files (.txt). The file structure (prototype) is like:
A|B|C|D....|Z$(some integer value)#(some integer value). Again the contents of A to Z are a1_a2_a3_a4......aN|b1_b2_b3_b4......bN|......|z1_z2_z3_z4.....zN. So what I wanted to do is when Server application has started, it has to load these files one-by-one and save the contents of each file in a Container class and again the contents of the file into particular variables based on the delimiters i.e.
for (int i=0; i< (Number of files); i++)
{
1) Load the file[0] in Container class[0];
2) Read the Container class[0] search for occurences of delimiters "_" and "|"
3) Till next "|" occurs, save the value occurred at "_" to an array or variable (save it in a buffer)
4) Do this till the file length completes or reaches EOF
5) Next read the second file, save it in Container class[1] and follow the steps as in 2),3) and 4)
}
I want to know if Vector or Map suits my requirement? As I need to search for occurrences of delimiters and push_back them and access while necessity comes.
Can I read whole single file as block and manipulate with the buffer or while file read only using seekg I can push the values to stack? One which will be better and easier to implement? What are the possibilities of using regex?
According to the format of input, and its size, I'd suggest doing something along these lines for reading and parsing the input:
void ParseOneFile (std::istream & inp)
{
std::vector<std::vector<std::string>> data;
int some_int_1 = 0, some_int_2 = 0;
std::string temp;
data.push_back ({});
while (0 == 0)
{
int c = inp.get();
if ('$' == c)
{
data.back().emplace_back (std::move(temp));
break;
}
else if ('|' == c)
{
data.back().emplace_back (std::move(temp));
data.push_back ({});
}
else if ('_' == c)
data.back().emplace_back (std::move(temp));
else
temp += char(c);
}
char sharp;
inp >> some_int_1 >> sharp >> some_int_2;
assert ('#' == sharp);
// Here, you have your data and your two integers...
}
The above function does not return the information it extracts, so you will want to change that. But it does read one of your files into a vector of vector of strings called data and two integers (some_int_1 and some_int_2.) It uses C++11 and does this reading and parsing quite efficiently, both in terms of processing and memory.
And, the above code does not check for any errors and inconsistent formatting in the input file.
Now, for your data structure problem. Since I have no idea about the nature of your data, I can't say for sure. All I can say is that a two-dimensional array and two integers on the side feels like a natural fit for this data. Since you have several files, you can store them all in another dimension of vector (or perhaps in a map, mapping a file name to a data structure like the following:
struct OneFile
{
vector<vector<string>> data;
int i1, i2;
};
vector<OneFile> all_files;
// or...
// map<string, OneFile> all_files;
The above function would fill one instance of the OneFile struct above.
As an example, all_files[0].data[0][0] will be a string referring to data item A0 in the first file, and all_files[7].data[25][3] will be another string referring to data item Z3 in the 8th file.

Using multiple instances of getline in C++

I've been working on a class assignment for C++ and we're required to acquire input from a text file and assign those values to an array....one is a string, the second an int, and the third a double.
We've only been introduced to arrays and I don't know anything yet about pointers or linked lists, or any of the higher end stuff, so I feel like I'm somewhat limited in my options. I've worked all day trying to figure out a way to acquire input from the text file and assign it to the appropriate array. I've tried to use getline to read the input file and set a delimiter to separate each piece of data but I get an error when I try to use it more than once. From what I've read, this has to do with how I'm overloading the function but I'm at a loss at resolving it. Every explanation I've read about it goes beyond my current level of familiarity. Right now, I'm focused on this fragment of code:
for (int i = 0; i < EMP_NUM; i++) // Get input from text file for names.
getline(inFile, nameAr[i], '*');
for (int i = 0; i < EMP_NUM; i++) // Input for hours.
getline(inFile, hoursAr[i], '*');
for (int i=0; i < EMP_NUM; i++) // Input for hourly rate.
getline(inFile, hrateAr[i], '*');
I'm trying to use getline three times and write the data to three separate arrays, then make a series of calculations with them later and output them to another text file. The first instance of getline doesn't produce any compiler errors but the latter two do. I'm not quite sure of another solution to get the data into my arrays, so I'm at a loss. Any help would be great!
If I understand correctly you merely have three values in a file: a string, an int and a double. I assume they are delimited by whitespace.
If that is so then you don't need std::getline(). Rather, use the extraction operator:
std::ifstream file("input.txt");
std::string s;
if( ! (file >> s) ) { // a single word extracted from the file
// failure
}
int n;
// ...
1) Instead of three different iteration, use only one
2) Pass string object in getline instead of pointers
string buf;
for (int i = 0; i < EMP_NUM; i++) // Get input from text file for names.
{
getline(inFile, buf, '*');
nameAr[i] = buf;
getline(inFile, buf, '*'); //assuming delimiter is again *
hoursAr[i] = atoi(buf.c_str() ); //C way to doing it...however in c++ u have to use stringstreams....
getline(inFile, buf);
hrateAr[i] = atof(buf.c_str() );;
}
What do the compiler errors say? Are you sure that the error is caused by getline? Maybe it's not because the getline calls but because of multiple declarations of i.