Reading a truth table in from plain text, translating it to a map<int,list<int>> in C++ - c++

I'm writing a file parser for standard C++ (no third-parties like Boost, unfortunately)...
I'm dealing with a situation where I have a plain-text file formatted like this:
1 ..header line 1, unimportant
2 ..header line 2, unimportant
3 ..header line 3, unimportant
4 1 0 0 0 0 0 0 1
5 2 0 1 0 2 1 0 0
...skipping ahead
14 11 1 0 0 0 0 1 1
15 12 0 0 1 0 0 1 2
16 13 2 0 0 0 1 0 0
...etc
(Note: The first column, 1 - 16, are line numbers. The skip ahead is meant to represent the gap of 8 spaces from the start of each line gets shorter as the second column, 1- 13, gets longer and longer numbers.
This text file denotes a truth table whereby items must be grouped by the columns, and each group will be composed of corresponding numbers from the first column. For instance, by the end of parsing this example, a map of type <int, list<int>> should look like (assuming there are no truths between lines 6 and 13):
[1: {11, 13}]
[2: {5, 15}]
[3: {12}]
[4: {5}]
[5: {5,16}]
[6: {14,15}]
[7: {4,14,15}]
In general, the number of columns in the text file can change, meaning the number of groups will change, so this must be accounted for. The number of rows is also variable, but will both will always start at 1 and the columns will not be numbered (but we can do that ourselves).
Now, were I to do this in Java I'd have a working solution rather quickly. However, I've never done work in C++ and am having trouble figuring out how to perform the operations properly, between its different structures and syntax. Despite scouring and finding lots of good guides, my lack of C++ foundation makes it hard to understand even the syntax differences that, I speculate, must be very basic.
Still, I've designed procedure, and it should work according to the following pseudocode:
//Begin Parse
//Create filereader "strmFileIn"
//To get past the first three lines, which will always be needless header info
string dummyLine;
for (i = 1; i <= 3; i++)
getline(strmFileIn, strDummyLine);
//Read first line to get count of how many groups are present
//(Copied from internet: gets the first line and puts the cursor back at its start)
int startPos = strmFileIn.tellg();
string strFirstLine;
getline(strmFileIn, strFirstLine);
strmFileIn.seekg(startPos, std::ios_base::beg);
//Tokenize strFirstLine into Array<int> tempArray
int numGroups = tempArray.size() - 1 //accounting for the row-header column, 1 - 13
//Create map (going to use java syntax, sorry)
Map<int,list<int>> myMap = new Map<int,list<int>>;
//Populate map with ints and empty lists (java again, sorry)
for (int i = 1; i <= numGroups; i++)
myMap.put(i, new List<int>);
//Iterate over lines in the file and appropriately populate the map's lists
while (fileIn != eof)
{
string fileInLine;
getline(strmFileIn, fileInLine);
//Tokenize fileInLine into Array<int> tempFileInArray
int intElemID = tempFileInArray[0];
//Remove element [0] from tempFileInArray (will be the row number, 1 - 13
//Iterate over remaining items in tempFileInArray, affect myMap where necessary
for (int i = 1; int i <= groupNum; i++)
if (tempFileInArray[i] != 0) //is not a strict truth-table, as any nonzero will be a truth
myMap.get[i].add(intElemID);
}
//Remove any entries in myMap with empty lists
//Kill strmFileIn for memory's sake
//End Parse
As you can see, my code is a broken mix of pseudocode and comparable Java I've already figured out. I just don't know how to turn this into C++; even with similar data structures, the syntax is a little daunting to someone with no experience. Is anyone here willing to help me out with it?
I really appreciate any insight.

Your code seems overly complicated, so lets do this one step at a time. Additionally, neither your code nor file format show how many bool columns should exist on each row, so I've ignored that part for this answer.
But first, a tip: In C++, the containers you care about 99.99% of the time are std::unordered_map, std::vector, and in very rare cases, std::map, boost::stable_vector and std::deque. In your case, you have rows with sequential indices, and the data for each row appears to be better stored as a vector of booleans. However, we'll do it your way, with the replacement of std::vector instead of std::list, and std::unordered_map instead of std::map.
This major data structures are mostly obvious:
std::unordered_map<int,std::vector<int>> myMap;
std::ifstream strmFileIn("input_file.txt");
Next your code reads in the first line, then ignores it entirely. I have no idea why, so I'll skip over that. Then, we parse out the lines one by one:
std::string full_current_line;
//for as long as we can read more lines, read them in
while(std::getline(strmFileIn, full_current_line)
{
//make the line into a stream so that we can parse data out
std::stringstream cur_line_stream(full_current_line);
//read in the line identifier
int identifier = 0;
cur_line_stream >> identifier;
//if that failed, abort.
if (!cur_line_stream)
{
//invalid identifer!
std::cerr << "identifier is invalid!\n"; //report
strmFileIn.setstate(std::ios::failbit); //failed to parse the data
break; //do not continue this loop
}
After that, we parse out the data for each row, which is surprisingly simple:
int column = 0;
int is_true = false;
//for each number remaining in the row...
while(cur_line_stream >> is_true)
{
//hooray we read a column!
++column;
if (is_true ==0)
{
//if it's zero, skip it
}
else if (is_true == 1)
{
//get the data for this column, and add this row's identifier
//myMap[column] will create a new empty entry if it didn't exist yet
//NOTE: This syntax only creates when used with map and unordered_map.
// This syntax does NOT create for vector and deque.
//once we have the vector, we push_back the new identifier into it.
myMap[column].push_back(identifier);
}
else
{
//invalid data!
std::cerr << is_true << " is invalid! found on row " << identifier << '\n';
cur_line_stream.setstate(std::ios::failbit); //failed to parse the data
strmFileIn.setstate(std::ios::failbit); //failed to parse the data
break; //do not continue this loop
}
}
}
If you know that groupNum contained the number of bools, you could replace that second while with something more like you already have:
for (int i = 1; int i <= groupNum; i++)
{
cur_line_stream >> is_true;
//if that failed, abort
if (!cur_line_stream)
{
//invalid data!
std::cerr << "data could not be read on row " << identifier << '\n';
cur_line_stream.setstate(std::ios::failbit); //failed to parse the data
strmFileIn.setstate(std::ios::failbit); //failed to parse the data
break; //do not continue this loop
}
else if (is_true == 0)
{
//if it's zero, skip it
}
etc etc etc

Work the other way. Code only in C++ (not in Java and don't think in Java), but start by parsing a small chunk of your syntax. First, code the lexer. Test it. Then code the parser, probably a recursive descent parser, and test it on short simple subelements of your language. Perhaps you'll need some small look-ahead (an easy task, use a std::list<Token>) Keep going up.
Start by formalizing, with pencil and paper, your input language. Could you for instance write a simple BNF grammar for it? (your question does not explain what is the input, it just gives an example)
In C++ parlance: to parse a map<int,list<int>> you certainly need to be able to parse int and list<int>. So write first the parsers for these.
As commented by Mooing Duck, your input language (which you did not define, just gave an example) seems simple enough to avoid most of this. But still, the idea is the same, think directly in C++ and start by reading a simple subpart of the input. Test your code. When that works, increase the part that is accepted. Repeat all this.

Here's a very simple solution that uses nothing but C++ and standard libraries. It just reads line by line and pulls each element out of the line with stream extraction using operator>>.
#include <iostream>
#include <fstream>
#include <sstream>
#include <map>
#include <list>
int main(int argc, char* argv[])
{
// Parse command line
if( argc != 2 )
return 1;
std::fstream fin(argv[1]);
if( !fin.good() )
{
std::cerr << "Error opening file for reading: " << argv[1] << std::endl;
return 1;
}
// Skip first three lines
std::string line;
for( int i=0; i<3; ++i )
{
std::getline(fin, line);
}
// Read each line
std::map<int, std::list<int> > hits;
while( std::getline(fin, line) )
{
// Extract each element from the line
std::stringstream sstr(line);
// Read line number from first column
int linenum = 0;
sstr >> linenum;
// Interpret remaining columns as truth values
bool truth;
int col=1;
while( sstr >> truth )
{
// Store position in map if true
if( truth )
{
hits[col].push_back(linenum);
}
col++;
}
}
// Print results
std::map<int, std::list<int> >::const_iterator col_iter;
for( col_iter = hits.begin(); col_iter != hits.end(); ++col_iter )
{
std::cout << "[" << col_iter->first << ": {";
std::list<int>::const_iterator line_iter;
for( line_iter = col_iter->second.begin(); line_iter != col_iter->second.end(); ++line_iter )
{
std::cout << *line_iter << " ";
}
std::cout << "} ]" << std::endl;
}
return 0;
}

Related

seekg doesn't go to beginning of file

I'm trying to make a random name generator. The problem is that in order to get the count of lines in the file, I have to loop through it.
So when I need to loop through it again in getRandomName() to get a name, it has already reached the end of the file
I tried solving the issue with seekg(0, std::ios::beg) but it doesn't work for some reason.
int getLineCount(std::fstream &names) {
int count{};
while (names) {
std::string name;
getline(names, name);
++count;
};
// last line is empty
return count - 1;
}
std::string getRandomName(std::fstream &names, int lineCount) {
int randomNum{getRandomNumber(1, lineCount)};
std::string name;
names.seekg(1, std::ios::beg); // here i try to go to the beginning but it doesnt work
for (int i{0}; i < randomNum; ++i) {
names >> name;
};
return name;
};
int main() {
std::srand(static_cast<unsigned int>(std::time(nullptr)));
std::rand();
std::fstream names{"names.txt"};
int lineCount{getLineCount(names)};
std::cout << getRandomName(names, lineCount);
}
The problem is that a file stream considers reaching the end of the file as an error condition and sets the according bits, both the fail-bit and the EOF-bit. As long as this state persists, any further file operations fail. You can set the stream back to the normal operating state by clearing the error state, though – if you do so, then you'll be able to proceed as intended.
If need those lookups frequently then it might be worth to consider buffering the data lines within a std::vector<std::string> – unless if you have to handle extremely large data (thus provoking paging effects) this would be far more efficient. Even with paging effects, but with large enough disk space available you still get better for every lookup as you'd have to load at most one memory page back from disk.
If you need the lookup just once then you might get along without the getLineCount function entirely – select a random value from entire maximum range and just count the number of lines until you found the desired line – or the end of file got reached. If the latter happens, then recalculate the random index based upon the number of lines found and iterate of over the file again. The larger your file is, the greater is the chance that you only need to iterate once, and if you still need to do twice, nothing is lost anyway... Note, though, that this approach requires your random number generator generating equally distributed random numbers!
This would work for multiple calls as well, though the chance of a benefit get's smaller as with every further call the chance of reading beyond file size at least once increases.
Your getLineCount() functions reads with getline() through the file until nothing can be read anymore. When it arrives at the end, an error state is set, with eofbit.
All subsequent actions on the stream will fail, including seekg(0, std::ios::beg);, until you names.clear(); the error state.
By the way, looping on getline() avoids getline() to fail in the loop body, and makes the -1 unnecessary. Another thing you could do is to make your function neutral for the read position of the file. It's optional bu would me more consistent with the name of your function which suggests that it just gets something, not that it consumes the stream to the end.
int getLineCount(std::fstream &names) {
int count{};
std::string name;
auto old_pos = names.tellg(); // backup current position
while (getline(names, name))
++count;
names.clear(); // reset eof error caused by loop
names.seekg (old_pos, std::ios::beg); // restore position
return count;
}
Not related
Your random position might lead to inconsistencies, if names on a line can include whitespaces, because >> reads space separated strings and not full lines. E.g. if your file has two lines:
Bjarne Stroustrup
B.W.Kernighan
Your random read could return Bjarne or Stroustrup but never B.W.Kernighan because there are 2 lines but 3 space separated strings. So better read the random line as you count them, using getline() again.
Your first step should start from adding logs (I've also fixed minor issues, like inconsistent reading of data, 1 instead 0 and so on).
#define LOG(x) std::cerr << __LINE__ << " " #x " = "<< x << '\n'
int getRandomNumber(int a, int b)
{
static std::random_device rd;
static std::mt19937 gen(rd());
std::uniform_int_distribution<int> distrib(a, b);
return distrib(gen);
}
int getLineCount(std::istream &names) {
int count{};
std::string name;
while (getline(names, name)) {
++count;
LOG(count);
LOG(names.tellg());
};
return count - name.empty();
}
std::string getRandomName(std::istream &names, int lineCount) {
int randomNum{getRandomNumber(1, lineCount)};
LOG(randomNum);
std::string name;
LOG(names.tellg());
names.seekg(0, std::ios::beg);
LOG(names.tellg());
for (int i{0}; i < randomNum; ++i) {
getline(names, name);
};
return name;
};
int main() {
std::ifstream names{"names.txt"};
LOG(names.tellg());
int lineCount{getLineCount(names)};
LOG(lineCount);
std::cout << getRandomName(names, lineCount);
}
This produces this output https://wandbox.org/permlink/EPLHqMBg1s8Awe9F :
44 names.tellg() = 0
23 count = 1
24 names.tellg() = 13
23 count = 2
24 names.tellg() = 27
23 count = 3
24 names.tellg() = 40
23 count = 4
24 names.tellg() = 53
23 count = 5
24 names.tellg() = -1
47 lineCount = 5
31 randomNum = 4
33 names.tellg() = -1
35 names.tellg() = -1
-1 indicates that stream is in error state.
And this is obvious you have read file to the end, so there was attempt to read beound file and error flag is set.
When error flag is set, stream is unusable until flag is cleared. So just adding names.clear(); in proper place fixes issue: https://wandbox.org/permlink/Us6b3Jw3v6JFwlpX
In your getLineCount() function use while (names.peek() != EOF) instead.

Summing comma separated ints from a text file and then storing to an array in C++

I was tasked to read 3 rows of 5 comma separated values from a text file, sum up each column, and store the result in an array called bins. I am struggling to read the ints from the text file as they are comma separated. I first need to clarify how to read just the ints.
My next thought was to store the ints from the file into an array called "calc", and use the index of each element to sum up the values. I would then store these results into the "bins" array.
Here is some code I have tried to read the comma separated ints yet I cannot seem to get it to work.
int a,b,c,d,e,f,g,h,i,j,k,l,m,n,o;
int calc[15] = {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o};
ifstream myfile;
myfile.open("values.txt");
for(int i = 0; i <= 15; i++)
{
myfile >> calc[i];
myfile.close();
a = calc[0];
b = calc[1];
c = calc[2];
d = calc[3];
e = calc[4];
f = calc[5];
g = calc[6];
h = calc[7];
i = calc[8];
j = calc[9];
k = calc[10];
l = calc[11];
m = calc[12];
n = calc[13];
o = calc[14];
cout << calc[i] << endl;
}
I am really new to working with code and I dont quite understand how to work with values in this manner. It is a simple task yet I cannot seem how to implement it with code.
I am really new to working with code and I dont[sic] quite understand how to work with values in this manner.
OK, I have several tips for you:
① separate your tasks
You ran into a hitch parsing the input in the supplied format, dealing with the comma. Parsing the supplied input files is a totally different problem from the real work, which is summing the columns. Write them separately in the code.
In general you should isolate the "real work" in its own function and have it take parameters as input and returns results as a function return value. The input and output are written separately.
That gives you the added bonus of automating the testing by calling the "work" function with built-in test cases. In this case, it allows you to defer figuring out the parsing. You just pass in test data for now, to get the "work" part working, and then you can come back to parsing the input. Then, when you do need help, it will be specific to "parsing comma separated values" and have nothing to do with why you want them.
② To handle groups of values, you use the array.
This means subscripting or iterating, using loops (or library algorithms) to take what you want to do, written once, and apply it to each value in the array.
Given arrays input and sum, you can accumulate the current row (input) into the running sum with code like this:
for (size_t i = 0; i < COLS; ++i) {
sum[i] += input[i];
}
overall program sketch
open the file
repeat three times:
read a row of input
accumulate the sum with the new input
print the results
Note, as explained in the first topic, that read a row and accumulate the sum are separate functions and separate sub-tasks to figure out. This is called top-down decomposition of a problem.
It's best to use parameters for input and return for output of the function, but for this simple task I'll just use a global variable. Passing/returning is probably harder than the task you are learning! Note though that this is unrealistic in that in real code you would not want to use global variables like this. However, you might turn this into an "object", which you'll learn later.
#include <fstream>
constexpr size_t ROWS = 3;
constexpr size_t COLS = 5;
int input[COLS];
int sum[COLS];
std::ifstream infile;
int main()
{
infile.open("values.txt");
// todo: check for success and feedback to the user if failed
// skipped: zero out the sum array. Global variable start at 0,
// but more generally, you would need to initialize this.
for (size_t row= 0; row < ROWS; ++row) {
read_row();
sum_row();
}
print_results();
}
The sum_row function is what you saw earlier.
Note that with top-down decomposition, you can stub out parts that you will work on later. In particular, you can have read_row return hard-coded result at first, or read from a different format, so you can test the overall program. Then, go back and get that part working for real.
Top-Down Decomposition is critical for any kind of programming project.
Oops... most of your code is useless, and what remains is not really good.
Writing good programs is not a matter of adding C++ instructions one after the other. You must first design the overall structure of your program.
Here you have an input file containing lines of 5 comma separated values and want to compute an array (of size 5) containing the sum of the columns.
Let go from a high level
open the file
loop over the lines
loop 5 times:
read a field up to a comma (or end of the line)
decode that field into an int value
sum that value into an array
close the file
Ok, to be able to sum the values into an array, we will have to define the array before the loop and initialize its elements to 0.
Ok, C++ provide std::ifstream to read a file, std::getline to read a stream up to a delimiter (default being newline), std::istringstream to read the content of a string as an input stream and std::stoi to decode a string representing an int value.
Once this is done, but only after:
the program structure is clearly designed
the required tools from the standard library have been identified
it is possible to sit down in front of your keyboard and start coding.
BTW, this program will never require the declaration of 15 variables a to o nor an array of 15 integers: only int bins[5]...
It could be (tests omitted for brievety):
int bins[5] = {0}; // initializing the first value is enough, others will be 0
std::ifstream in("values.txt");
std::string line;
while (std::getline(in, line)) {
// std::cout << line << '\n'; // uncomment for debug
std::stringstream ss(line);
for(int& val: bins) { // directly loop over the bins array
std::string field;
std::getline(ss, field, ',');
val += std::atoi(field.c_str());
}
}
Of course, for a professional grade (or simply robust) program, every input operation should be followed by a test on the stream...
You can use the std::getline function within the string library to get each comma separated integer.
std::ifstream myfile("values.txt");
for(int i = 0; i < 15; i++)
{
std::string integer_as_string;
std::getline(myfile, integer_as_string, ',');
calc[i] = std::stoi(integer_as_string);
}
myfile.close();
Here we specify that the getline function will read a line of characters in the input until a , character is found. This string is assigned to the integer_as_string variable which will then be converted to an integer and gets assigned to the array.
Also note that i <= 15 will result in undefined behavior. You can further read it here: Wikipedia. And the myfile.close() function was set inside the for loop. This means that in every iteration, you will be closing the file. This is not needed. I think what your looking for is something like this.
std::ifstream myfile("values.txt");
for(int i = 0; i < 15; i++)
{
std::string integer_as_string;
std::getline(myfile, integer_as_string, ',');
calc[i] = std::stoi(integer_as_string);
std::cout << calc[i] << std::endl;
}
myfile.close();
a = calc[0];
b = calc[1];
c = calc[2];
d = calc[3];
e = calc[4];
f = calc[5];
g = calc[6];
h = calc[7];
i = calc[8];
j = calc[9];
k = calc[10];
l = calc[11];
m = calc[12];
n = calc[13];
o = calc[14];
References:
std::stoi
Why is "using namespace std;" considered bad practice?
First, your array have element with indices from 0 to 14, thus for(int i = 0; i <= 15; i++) should be for(int i = 0; i < 15; i++)
The loop itself might benefit from error-checking. What if file contains less than 15 values?
for(int i = 0; i <= 15; i++)
{
// you want check status of myfile here.
}
myfile >> calc[i] wouldn't work well with commas unless you add comma to a separator class for that stream. Albeit that can be done that's a little large change and one can use getline (see answers here for examples) instead to specify separator.
If you want named variables to refer to element of array, you can make them references and structurally bind them to array (or other tuple-like data structure, e.g. struct, etc.) provided you have access to C++17
int calc[15] = {};
auto& [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o] = calc;
a would become a reference to calc[0], b to calc[1] and so on.

Parsing a CSV file - C++

C++14
Generally, the staff in university has recommended us to use Boost to parse the file, but I've installed it and not succeeded to implement anything with it.
So I have to parse a CSV file line-by-line, where each line is of 2 columns, separated of course by a comma. Each of these two columns is a digit. I have to take the integral value of these two digits and use them to construct my Fractal objects at the end.
The first problem is: The file can look like for example so:
1,1
<HERE WE HAVE A NEWLINE>
<HERE WE HAVE A NEWLINE>
This format of file is okay. But my solution outputs "Invalid input" for that one, where the correct solution is supposed to print only once the respective fractal - 1,1.
The second problem is: The file can look like:
1,1
<HERE WE HAVE A NEWLINE>
1,1
This is supposed to be an invalid input but my solution treats it like a correct one - and just skips over the middle NEWLINE.
Maybe you can guide me how to fix these issues, it would really help me as I'm struggling with this exercise for 3 days from morning to evening.
This is my current parser:
#include <iostream>
#include "Fractal.h"
#include <fstream>
#include <stack>
#include <sstream>
const char *usgErr = "Usage: FractalDrawer <file path>\n";
const char *invalidErr = "Invalid input\n";
const char *VALIDEXT = "csv";
const char EXTDOT = '.';
const char COMMA = ',';
const char MINTYPE = 1;
const char MAXTYPE = 3;
const int MINDIM = 1;
const int MAXDIM = 6;
const int NUBEROFARGS = 2;
int main(int argc, char *argv[])
{
if (argc != NUBEROFARGS)
{
std::cerr << usgErr;
std::exit(EXIT_FAILURE);
}
std::stack<Fractal *> resToPrint;
std::string filepath = argv[1]; // Can be a relative/absolute path
if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
{
std::cerr << invalidErr;
exit(EXIT_FAILURE);
}
std::stringstream ss; // Treat it as a buffer to parse each line
std::string s; // Use it with 'ss' to convert char digit to int
std::ifstream myFile; // Declare on a pointer to file
myFile.open(filepath); // Open CSV file
if (!myFile) // If failed to open the file
{
std::cerr << invalidErr;
exit(EXIT_FAILURE);
}
int type = 0;
int dim = 0;
while (myFile.peek() != EOF)
{
getline(myFile, s, COMMA); // Read to comma - the kind of fractal, store it in s
ss << s << WHITESPACE; // Save the number in ss delimited by ' ' to be able to perform the double assignment
s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
getline(myFile, s, NEWLINE); // Read to NEWLINE - the dim of the fractal
ss << s;
ss >> type >> dim; // Double assignment
s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
if (ss.peek() != EOF || type < MINTYPE || type > MAXTYPE || dim < MINDIM || dim > MAXDIM)
{
std::cerr << invalidErr;
std::exit(EXIT_FAILURE);
}
resToPrint.push(FractalFactory::factoryMethod(type, dim));
ss.clear(); // Clear the buffer to update new values of the next line at the next iteration
}
while (!resToPrint.empty())
{
std::cout << *(resToPrint.top()) << std::endl;
resToPrint.pop();
}
myFile.close();
return 0;
}
You do not need anything special to parse .csv files, the STL containers from C++11 on provide all the tools necessary to parse virtually any .csv file. You do not need to know the number of values per-row you are parsing before hand, though you will need to know the type of value you are reading from the .csv in order to apply the proper conversion of values. You do not need any third-party library like Boost either.
There are many ways to store the values parsed from a .csv file. The basic "handle any type" approach is to store the values in a std::vector<std::vector<type>> (which essentially provides a vector of vectors holding the values parsed from each line). You can specialize the storage as needed depending on the type you are reading and how you need to convert and store the values. Your base storage can be struct/class, std::pair, std::set, or just a basic type like int. Whatever fits your data.
In your case you have basic int values in your file. The only caveat to a basic .csv parse is the fact you may have blank lines in between the lines of values. That's easily handled by any number of tests. For instance you can check if the .length() of the line read is zero, or for a bit more flexibility (in handling lines with containing multiple whitespace or other non-value characters), you can use .find_first_of() to find the first wanted value in the line to determine if it is a line to parse.
For example, in your case, your read loop for your lines of value can simply read each line and check whether the line contains a digit. It can be as simple as:
...
std::string line; /* string to hold each line read from file */
std::vector<std::vector<int>> values {}; /* vector vector of int */
std::ifstream f (argv[1]); /* file stream to read */
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
...
}
Above, each line is read into line and then line is checked on whether or not it contains digits. If so, parse it. If not, go get the next line and try again.
If it is a line containing values, then you can create a std::stringstream from the line and read integer values from the stringstream into a temporary int value and add the value to a temporary vector of int, consume the comma with getline and the delimiter ',', and when you run out of values to read from the line, add the temporary vector of int to your final storage. (Repeat until all lines are read).
Your complete read loop could be:
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
int itmp; /* temporary int */
std::vector<int> tmp; /* temporary vector<int> */
std::stringstream ss (line); /* stringstream from line */
while (ss >> itmp) { /* read int from stringstream */
std::string tmpstr; /* temporary string to ',' */
tmp.push_back(itmp); /* add int to tmp */
if (!getline (ss, tmpstr, ',')) /* read to ',' w/tmpstr */
break; /* done if no more ',' */
}
values.push_back (tmp); /* add tmp vector to values */
}
There is no limit on the number of values read per-line, or the number of lines of values read per-file (up to the limits of your virtual memory for storage)
Putting the above together in a short example, you could do something similar to the following which just reads your input file and then outputs the collected integers when done:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
int main (int argc, char **argv) {
if (argc < 2) { /* validate at least 1 argument given for filename */
std::cerr << "error: insufficient input.\nusage: ./prog <filename>\n";
return 1;
}
std::string line; /* string to hold each line read from file */
std::vector<std::vector<int>> values {}; /* vector vector of int */
std::ifstream f (argv[1]); /* file stream to read */
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
int itmp; /* temporary int */
std::vector<int> tmp; /* temporary vector<int> */
std::stringstream ss (line); /* stringstream from line */
while (ss >> itmp) { /* read int from stringstream */
std::string tmpstr; /* temporary string to ',' */
tmp.push_back(itmp); /* add int to tmp */
if (!getline (ss, tmpstr, ',')) /* read to ',' w/tmpstr */
break; /* done if no more ',' */
}
values.push_back (tmp); /* add tmp vector to values */
}
for (auto row : values) { /* output collected values */
for (auto col : row)
std::cout << " " << col;
std::cout << '\n';
}
}
Example Input File
Using an input file with miscellaneous blank lines and two-integers per-line on the lines containing values as you describe in your question:
$ cat dat/csvspaces.csv
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
Example Use/Output
The resulting parse:
$ ./bin/parsecsv dat/csvspaces.csv
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Example Input Unknown/Uneven No. of Columns
You don't need to know the number of values per-line in the .csv or the number of lines of values in the file. The STL containers handle the memory allocation needs automatically allowing you to parse whatever you need. Now you may want to enforce some fixed number of values per-row, or rows per-file, but that is simply up to you to add simple counters and checks to your read/parse routine to limit the values stored as needed.
Without any changes to the code above, it will handle any number of comma-separated-values per-line. For example, changing your data file to:
$ cat dat/csvspaces2.csv
1
2,2
3,3,3
4,4,4,4
5,5,5,5,5
6,6,6,6,6,6
7,7,7,7,7,7,7
8,8,8,8,8,8,8,8
9,9,9,9,9,9,9,9,9
Example Use/Output
Results in the expected parse of each value from each line, e.g.:
$ ./bin/parsecsv dat/csvspaces2.csv
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9
Let me know if you have questions that I didn't cover or if you have additional questions about something I did and I'm happy to help further.
I will not update your code. I look at your title Parsing a CSV file - C++ and would like to show you, how to read csv files in a more modern way. Unfortunately you are still on C++14. With C++20 or the ranges library it would be ultra simple using getlines and split.
And in C++17 we could use CTAD and if with initializer and so on.
But what we do not need is boost. C++`s standard lib is sufficient. And we do never use scanf and old stuff like that.
And in my very humble opinion the link to the 10 years old question How can I read and parse CSV files in C++? should not be given any longer. It is the year 2020 now. And more modern and now available language elements should be used. But as said. Everybody is free to do what he wants.
In C++ we can use the std::sregex_token_iterator. and its usage is ultra simple. It will also not slow down your program dramatically. A double std::getline would also be ok. Although it is not that flexible. The number of columns must be known for that. The std::sregex_token_iterator does not care about the number of columns.
Please see the following example code. In that, we create a tine proxy class and overwrite its extractor operator. Then we us the std::istream_iterator and read and parse the whole csv-file in a small one-liner.
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
#include <vector>
// Define Alias for easier Reading
// using Columns = std::vector<std::string>;
using Columns = std::vector<int>;
// The delimiter
const std::regex re(",");
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
std::string line;
cp.columns.clear();
if(std::getline(is, line) && !line.empty()) {
// Split values and copy into resulting vector
std::transform(
std::sregex_token_iterator(line.begin(), line.end(), re, -1), {},
std::back_inserter(cp.columns),
[](const std::string& s) { return std::stoi(s); });
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to
// std::vector<std::string>
operator Columns() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main() {
std::ifstream myFile("r:\\log.txt");
if(myFile) {
// Read the complete file and parse verything and store result into vector
std::vector<Columns> values(std::istream_iterator<ColumnProxy>(myFile), {});
// Show complete csv data
std::for_each(values.begin(), values.end(), [](const Columns& c) {
std::copy(c.begin(), c.end(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";
});
}
return 0;
}
Please note: There are tons of other possible solutions. Please feel free to use whatever you want.
EDIT
Because I see a lot of complicated code here, I would like to show a 2nd example of how to
Parsing a CSV file - C++
Basically, you do not need more than 2 statements in the code. You first define a regex for digits. And then you use a C++ language element that has been exactly designed for the purpose of tokenizing strings into substrings. The std::sregex_token_iterator. And because such a most-fitting language element is available in C++ since years, it would may be worth a consideration to use it. And maybe you could do basically the task in 2 lines, instead of 10 or more lines. And it is easy to understand.
But of course, there are thousands of possible solutions and some like to continue in C-Style and others like more moderen C++ features. That's up to everybodies personal decision.
The below code reads the csv file as specified, regardless of how many rows(lines) it contains and how many columns are there for each row. Even foreing characters can be in it. An empty row will be an empty entry in the csv vector. This can also be easly prevented, with an "if !empty" before the emplace back.
But some like so and the other like so. Whatever people want.
Please see a general example:
#include <algorithm>
#include <iterator>
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
// Test data. Can of course also be taken from a file stream.
std::stringstream testFile{ R"(1,2
3, a, 4
5 , 6 b , 7
abc def
8 , 9
11 12 13 14 15 16 17)" };
std::regex digits{R"((\d+))"};
using Row = std::vector<std::string>;
int main() {
// Here we will store all the data from the CSV as std::vector<std::vector<std::string>>
std::vector<Row> csv{};
// This extremely simple 2 lines will read the complete CSV and parse the data
for (std::string line{}; std::getline(testFile, line); )
csv.emplace_back(Row(std::sregex_token_iterator(line.begin(), line.end(), digits, 1), {}));
// Now, you can do with the data, whatever you want. For example: Print double the value
std::for_each(csv.begin(), csv.end(), [](const Row& r) {
if (!r.empty()) {
std::transform(r.begin(), r.end(), std::ostream_iterator<int>(std::cout, " "), [](const std::string& s) {
return std::stoi(s) * 2; }
); std::cout << "\n";}});
return 0;
}
So, now, you may get the idea, you may like it, or you do not like it. Whatever. Feel free to do whatever you want.

Logic for reading rows and columns from a text file (textparser) C++

I'm really stuck with this problem I'm having for reading rows and columns from a text file. We're using text files that our prof gave us. I have the functionality running so when the user in puts "numrows (file)" the number of rows in that file prints out.
However, every time I enter the text files, it's giving me 19 for both. The first text file only has 4 rows and the other one has 7. I know my logic is wrong, but I have no idea how to fix it.
Here's what I have for the numrows function:
int numrows(string line) {
ifstream ifs;
int i;
int row = 0;
int array [10] = {0};
while (ifs.good()) {
while (getline(ifs, line)) {
istringstream stream(line);
row = 0;
while(stream >>i) {
array[row] = i;
row++;
}
}
}
}
and here's the numcols:
int numcols(string line) {
int col = 0;
int i;
int arrayA[10] = {0};
ifstream ifs;
while (ifs.good()) {
istringstream streamA(line);
col = 0;
while (streamA >>i){
arrayA[col] = i;
col++;
}
}
}
edit: #chris yes, I wasn't sure what value to return as well. Here's my main:
int main() {
string fname, line;
ifstream ifs;
cout << "---- Enter a file name : ";
while (getline(cin, fname)) { // Ctrl-Z/D to quit!
// tries to open the file whose name is in string fname
ifs.open(fname.c_str());
if(fname.substr(0,8)=="numrows ") {
line.clear();
for (int i = 8; i<fname.length(); i++) {
line = line+fname[i];
}
cout << numrows (line) << endl;
ifs.close();
}
}
return 0;
}
This problem can be more easily solved by opening the text file as an ifstream, and then using std::get to process your input.
You can try for comparison against '\n' as the end of line character, and implement a pair of counters, one for columns on a line, the other for lines.
If you have variable length columns, you might want to store the values of (numColumns in a line) in a std::vector<int>, using myVector.push_back(numColumns) or similar.
Both links are to the cplusplus.com/reference section, which can provide a large amount of information about C++ and the STL.
Edited-in overview of possible workflow
You want one program, which will take a filename, and an 'operation', in this case "numrows" or "numcols". As such, your first steps are to find out the filename, and operation.
Your current implementation of this (in your question, after editing) won't work. Using cin should however be fine. Place this earlier in your main(), before opening a file.
Use substr like you have, or alternatively, search for a space character. Assume that the input after this is your filename, and the input in the first section is your operation. Store these values.
After this, try to open your file. If the file opens successfully, continue. If it won't open, then complain to the user for a bad input, and go back to the beginning, and ask again.
Once you have your file successfully open, check which type of calculation you want to run. Counting a number of rows is fairly easy - you can go through the file one character at a time, and count the number that are equal to '\n', the line-end character. Some files might use carriage-returns, line-feeds, etc - these have different characters, but are both a) unlikely to be what you have and b) easily looked up!
A number of columns is more complicated, because your rows might not all have the same number of columns. If your input is 1 25 21 abs 3k, do you want the value to be 5? If so, you can count the number of space characters on the line and add one. If instead, you want a value of 14 (each character and each space), then just count the characters based on the number of times you call get() before reaching a '\n' character. The use of a vector as explained below to store these values might be of interest.
Having calculated these two values (or value and set of values), you can output based on the value of your 'operation' variable. For example,
if (storedOperationName == "numcols") {
cout<< "The number of values in each column is " << numColsVal << endl;
}
If you have a vector of column values, you could output all of them, using
for (int pos = 0; pos < numColsVal.size(); pos++) {
cout<< numColsVal[pos] << " ";
}
Following all of this, you can return a value from your main() of 0, or you can just end the program (C++ now considers no return value from main to a be a return of 0), or you can ask for another filename, and repeat until some other method is used to end the program.
Further details
std::get() with no arguments will return the next character of an ifstream, using the example code format
std::ifstream myFileStream;
myFileStream.open("myFileName.txt");
nextCharacter = myFileStream.get(); // You should, before this, implement a loop.
// A possible loop condition might include something like `while myFileStream.good()`
// See the linked page on std::get()
if (nextCharacter == '\n')
{ // You have a line break here }
You could use this type of structure, along with a pair of counters as described earlier, to count the number of characters on a line, and the number of lines before the EOF (end of file).
If you want to store the number of characters on a line, for each line, you could use
std::vector<int> charPerLine;
int numberOfCharactersOnThisLine = 0;
while (...)
{
numberOfCharactersOnThisLine = 0
// Other parts of the loop here, including a numberOfCharactersOnThisLine++; statement
if (endOfLineCondition)
{
charPerLine.push_back(numberOfCharactersOnThisLine); // This stores the value in the vector
}
}
You should #include <vector> and either specific std:: before, or use a using namespace std; statement near the top. People will advise against using namespaces like this, but it can be convenient (which is also a good reason to avoid it, sort of!)

need help with C++ using maps to keep track of words in a INPUT file

Let say i have a text file with
today is today but
tomorrow is today tomorrow
then using maps how can i keep track of the words that are repeated? and on which line it repeats?
so far i have each string in the file read in as a temp and it is stored in the following way:
map<string,int> storage;
int count = 1 // for the first line of the file
if(infile.is_open()){
while( !infile.eof() ){
getline(in, line);
istringstream my_string(line);
while(my_string.good()){
string temp;
my_string >> temp;
storage[temp] = count
}
count++;// so that every string read in the next line will be recorded as that line.
}
}
map<string,int>::iterator m;
for(int m = storage.begin(); m!= storage.end(); m++){
out<<m->first<<": "<<"line "<<m->second<<endl;
}
right now the output is just
but: line 1
is: line 2
today: line 2
tomorrow: line 2
But instead..
it should print out(no repeating strings):
today : line 1 occurred 2 times, line 2 occurred 1 time.
is: line 1 occurred 1 time, line 2 occurred 1 time.
but: line 1 occurred 1 time.
tomorrow: line 2 occurred 2 times.
Note: the order of the string does not matter.
Any help would be appreciated. Thanks.
map stores a (key, value) pair with a unique key. Meaning that if you assign to the same key more than once, only the last value that you assigned will be stored.
Sounds like what you want to do is instead of storing the line as the value, you want to store another map of lines->occurances.
So you could make your map like this:
typedef int LineNumber;
typedef int WordHits;
typedef map< LineNumber, WordHits> LineHitsMap;
typedef map< string, LineHitsMap > WordHitsMap;
WordHitsMap storage;
Then to insert:
WordHitsMap::iterator wordIt = storage.find(temp);
if(wordIt != storage.end())
{
LineHitsMap::iterator lineIt = (*wordIt).second.find(count);
if(lineIt != (*wordIt).second.end())
{
(*lineIt).second++;
}
else
{
(*wordIt).second[count] = 1;
}
}
else
{
LineHitsMap lineHitsMap;
lineHitsMap[count] = 1;
storage[temp] = lineHitsMap;
}
you're trying to get 2 items of information out of the collection, when you only store 1 item of information in there.
The easiest way to extend your current implementation is to store a struct instead of an int.
So instead of:
storage[temp] = count
you'd do:
storage[temp].linenumber = count;
storage[temp].wordcount++;
where the map is defined:
struct worddata { int linenumber; int wordcount; };
std::map<string, worddata> storage;
print the results using:
out << m->first << ": " << "line " << m->second.linenumber << " count: " << m->second.wordcount << endl;
edit: use a typedef for the definitions, eg:
typedef MYMAP std::map<std::string, struct worddata>;
MYMAP storage;
then MYMAP::iterator iter;
Your storage data type is insufficient to store all the information you want to report. You could get there by using a vector for count storage but you'd have to do a lot of book-keeping to make sure you actually insert a 0 when a word is not encountered and create the vector with the right size when a new word is encountered. Not a trivial task.
You could switch your count part to a map of numbers, first being line and second being count... That would reduce the complexity of your code but wouldn't exactly be the most efficient method.
At any rate, you can't do what you need to do with just a std::map
Edit: just thought of an alternative version that would be easier to generate but harder to report with: std::vector< std::map<std::string, unsigned int> >. For each new line in a file you'd generate a new map<string,int> and push it onto the vector. You could create a helper type set<string> to contain all the words that appear in a file to use in your reporting.
That's probably how I'd do it anyway except I'd encapsulate all that crap in a class so that I'd just do something like:
my_counter.word_appearance(word,line_no);
Apart from anything else, your loops are all wrong. You should never loop on the eof or good flags, but on the success of the read operation. You want something like:
while( getline(in, line) ){
istringstream my_string(line);
string temp;
while(my_string >> temp ){
// do something with temp
}
}