How to loop through vectors for specific strings - c++

I am struggling to declare a loop that takes a field of a vector, check whether it appears for the first time or jump to the next vector until this field contains a new string.
My input file (.csvx) looks something like:
No.; ID; A; B; C;...;Z;
1;1_380; Value; Value; Value;...; Value;
2;1_380; Value; Value; Value;...; Value;
3;1_380; Value; Value; Value;...; Value;
...
41;2_380; Value; Value; Value;...; Value;
42;2_380; Value; Value; Value;...; Value;
...
400000; 6_392; Value; Value; Value;...; Value;
Note:File is relatively large....
I managed to parse my file into a vector<vector<string> > and split lines at semicolons to access any field.
Now I would like to access the first "ID", i.e. 1_380 and store parameters from same line, then go to the next ID 2_380 and store again those parameters and so on...
This is my code so far:
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <boost/algorithm/string.hpp>
using namespace std;
/*
* CSVX Reader defined to fetch data from
* CSVX file into vectors
*/
class CSVXReader
{
string fileName, delimiter;
public:
CSVXReader(string filename, string delm = ";") :
fileName(filename), delimiter(delm)
{}
vector<vector<string> > getData(); //Function to fetch data
}; //from CSVX file
/*
* Parse through CSVX file line by line
* and return the data in vector of vector
* of strings
*/
vector<vector<string> > CSVXReader::getData()
{
ifstream file(fileName);
vector<vector<string> > dataList; //Vector of vector
//contains all data
string line = "";
while (getline(file, line)) //Iterate through each line
//and split the content
//using delimiter
{
vector<string> vec; //Vector contains a row from
//input file
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
dataList.push_back(vec);
}
file.close();
return dataList;
}
int main(int argc, char** argv)
{
CSVXReader reader("file.csvx"); //Creating an object
//of CSVXReader
vector<vector<string> > dataList = reader.getData();//Get the data from
//CSVX file
for(vector<string> vec : datalist) //Loop to go through
//each line of
//dataList
//(vec1,vec2;vec3...)
if(vec[1] contains "_" && "appears for the first time")
{store parameters...};
else{go to next line};
return 0;
}
As you can see, I have no clue how to declare my loop properly...
To be clear, I want to check the second field of each vector "vec": is it new? -> Store data of same line, if not -> jump to next line, i.e. vector until a new ID appears.
Looking forward for any advice!

Since you wrote pseudo-code, it is difficult to write real code.
But in general, if you want to detect if an item has occurred already, you can utilize a std::unordered_set to implement the "appears for the first time".
Using your pseudo-code:
#include <unordered_set>
//...
std::unordered_set<std::string> stringSet;
//...
for(vector<string>& vec : datalist)
{
if(vec[1] contains "_" && !stringSet.count(vec[1]))
{
//...
stringSet.insert(vec[1]);
}
}
The condition checks if the item is in the unordered_set. If it is, then we skip, if not, then we process the item and add it to the unordered_set.

Basically you do not need all the code that the other answers provide. You need just one statement to copy the data to where you want to have them.
Let us assume that you have read your data already in your dataList. And you defined a new std::vector<std::vector<std::string>> parameter{}; where you want to store the unique result.
The algorithm libraray has a function called std:copy_if. This will copy data only, if a predicate (a condition) is true. Your condition is that a line is different from a previous line. Then it is a new line with new data and you will copy it. If a line is equal to its previous line data, then do not copy it.
So, we will remember the important data from the last line. And then compare in the next line the data with the stored value. If it is different, store the parameter. If not, then not. After each check, we assign the current value to the last value. As initial "last Value" we will use an empty string. So the first line will always be different. The statement will then look like this:
std::copy_if(dataList.begin(), dataList.end(), std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);
So we copy all data from the begin to the end of the dataList to the parameter vector, if and only if, the second string in the source vector (index=1) is different than our old remembered value.
Rather straightforward.
An additional optimization would be, to immediately sort out the correct parameters and not store the complete vector with all data in the first place, but to store only necessary data. This will reduce the necessary memory drastically.
Modify your while loop to:
string line = "";
string oldValue{};
while (getline(file, line)) //Iterate through each line
//and split the content
//using delimiter
{
vector<string> vec; //Vector contains a row from
//input file
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
if (oldValue != vec[1]) {
dataList.push_back(vec);
}
oldValue = vec[1];
}
With that you get it right from the beginning.
An additional solution is like below
#include <vector>
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <fstream>
#include <sstream>
#include <algorithm>
std::istringstream testFile{R"(1;1_380; Value1; Value2; Value3; Value4
2;1_380; Value5; Value6; Value7; Value8
3;1_380; Value9 Value10
41;2_380; Value11; Value12; Value13
42;2_380; Value15
42;2_380; Value16
500;3_380; Value99
400000; 6_392; Value17; Value18; Value19; Value20
400001; 6_392; Value21; Value22; Value23; Value24)"
};
class LineAsVector { // Proxy for the input Iterator
public:
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, LineAsVector& lv) {
// Read a line
std::string line; lv.completeLine.clear();
std::getline(is, line);
// The delimiter
const std::regex re(";");
// Split values and copy into resulting vector
std::copy( std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(lv.completeLine));
return is;
}
// Cast the type 'CompleteLine' to std::string
operator std::vector<std::string>() const { return completeLine; }
protected:
// Temporary to hold the read vector
std::vector<std::string> completeLine{};
};
int main()
{
// This is the resulting vector which will contain the result
std::vector<std::vector<std::string>> parameter{};
// One copy statement to copy all necessary data from the file to the parameter list
std::copy_if (
std::istream_iterator<LineAsVector>(testFile),
std::istream_iterator<LineAsVector>(),
std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);
// For debug purposes: Show result on screen
std::for_each(parameter.begin(), parameter.end(), [](std::vector<std::string> & sv) {
std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, " "));
std::cout << '\n';
}
);
return 0;
}
Please note: In function main, we do everything in one statement: std::copy_if. The source is in this case an std::istream so an std::ifstream (a file) or wahtever you want. In SO I use an std::istringstream because I cannot use files here. But it is the same. Just replace the variable in the std::istream_iterator. We iterate over the file with the std::istream_iterator.
What a pitty that nobody will read this . . .

Ok fellas, I was playing around with my code and realized that #Armins second solution (modified while loop) doesn't consider unordered lists, i.e. if an element shows up again much later, it is compared with previous element (oldValue) and inserted, although it exists already in my container...
After some reading (and more has to come obviously), I tend to #Paul's unordered_set. My first question arises right here: why didn't you suggest set instead? From what I found, unordered_set is apparently faster for search operations. In my personal very limited mind this is difficult to understand... but I don't want to dig too deep here.
Is this your reason? Or are there other advantages that I missed?
Despite your suggestion, I tried to use set, which seems in my situation a better, because more ordered way. And again my code resists to run:
set<vector<string> > CSVReader::getData() {
ifstream file(fileName);
set<vector<string> > container;
string line = "";
string uniqueValue{};
while (getline(file, line)) //Iterate through each line and split the content using delimiter
{
//Vector contains a row from RAO file
vector<string> vec;
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
uniqueValue = vec[2];
//Line (or vector) is added to container if the uniqueValue, e.g. 1_380, appears for the first time
if(!container.count(uniqueValue))
{
container.insert(vec);
}
}
file.close();
return container;
}
The error says:
error: no matching function for call to 'std::set<std::vector<std::__cxx11::basic_string<char> > >::count(std::__cxx11::string&)'
if(!localDetails.count(localDetail))
Since I followed your example, what did I do wrong?
PS: Just reading about SO policies... hope this additional question is acceptable though

Related

c++ graph input problem, finding the ways of an input?

we had a algorithm coding event in our school today, and they asked a question and no one could answer. I am trying to find an answer by using only standard library. (I am trying to solve this without .h files because in contests they want us to solve it like that.) So basically question is as follows:
*Write a (C / C++) program that will get a graph model as an argument.
*You must get this values from console while we are starting your application.
*Your program must write down all the possible word combination by using the graph model.
Ex Input on Console to call your app: “yourapp.exe 5ABCD1BCD1CDE”
After your application name, second word gives you information about the graph.
Notation: [STEPS][FROM1-TO1-TO2-...TOn]1[FROM2-TO1-TO2-...TOn]1 .....
[STEPS] First integer value ( 5 in our example) is the maximum word length to measure.
[FROM TO ... TO] blocks show connections in the graph. Each node is symbolized with one Upper
Letter. First on is connections start position others are destinations. Each connection(link) is one
way. So: “ABCD” means we have connection from A to B , A to C and A to D
The first node in the text is the start point for word creation.
This input means you have a graph like: https://imgur.com/BioHGqA
Desired Output:
A
AB
AC
AD
ABD
ABC
ABCD
ABCE
ACD
ACE
--------------------------------------------END OF THE QUESTION-----------------------------------------------------
I personally tried to find the index numbers in input, connection starts and etc. but i couldn't figured out how to solve this properly. Please help :=)
#include <iostream>
#include <string>
using namespace std;
int inputLength,maxLength,digitLength;
string word,digitIndex,starters;
int main(int argc, char *argv[])
{
//Saving the graph input as a variable named word
word = argv[1];
//Finding the max word and input lengths
maxLength=word[0] -'0';
inputLength=word.length();
cout<<"Your graph input "<<word<<endl;
cout<<"Maximum word length : "<<maxLength<<endl;
//Finding the digitIndexes in input.
for(int i=0;i<inputLength;i++){
if(isdigit(word[i])){
digitIndex+=to_string(i);
}
}
digitLength=digitIndex.length();
cout<<"digit indexes : "<<digitIndex<<endl;
cout<<"digitindex[1] : "<<digitIndex[1]-'0'<<endl;
cout<<"your word : "<<word<<endl;
//Finding the connection starts
for(int i=0; i<inputLength;i++){
if(isdigit(word[i])==true){
starters+=word[i+1];
}
}
cout<<"starters : "<<starters<<endl;
}
Interesting problem. But easy to implement using std::algorithms and recursive calls. Also, the data structure selection may help to design such an application.
Unfortunately, the description of the input format is not fully clear. I understand the “steps” part, but there is no description for the rest of the digits. I assume that they are simply delimiters and have no further meaning.
We will split the big task into some subtasks. And, we will use a class Graph, where we store all needed data and functions.
So, obviously the first task is, to split the input string. For that we extract the first characters consisting of digits and convert them to the integer value “steps”. The rest of the string will be tokenized by using a C++ standard functionality: The std::sregex_token_iterator.
This thing is an iterator. For iterating over a string, hence “sregex”. The begin()/end() part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement:
std::vector<std::string> split(std::sregex_token_iterator(init.begin(), init.end(), re, 1), {});
defines a variable “split” as a std::vector and uses the so called range-constructor of the std::vector.
You can see that I do not use the std::sregex_token_iterator’s "end()"-iterator for the std::vector explicitly.This iterator will be constructed from the empty brace-enclosed initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
Then we transform the resulting “from-to”-strings to our target storage. This is a std::map consisting of the “from” character as a key, and a std::vector of char’s (the targets) as the value. With that we always have an association from one start point (Vertex) to all end points and with that, implicitly the “Edges”. This data structure will span a virtual tree, which we can later traverse to find the required result.
We put all this in an “init” function, can call it from the class constructor and also from the overwritten extractor operator. I added the extractor as an additional functionality to make life easier. So you do not need to use the main-functions argc and argv, but can directly read from std::cin via:
Graph graph;
std::cin >> graph;
Now we have all data in our map and can start to build the solution. We will store all resulting “ways” in a std::vector of std::string. For building the “ways” we “track” the way through the “graph”. So, every time, when we see a new vertex, we add it to the “track” and if we reach the end of a route or if the route is longer than “steps”, we store the new track in “ways”.
So, the OP requested a special output format. To create this, we must use “Breadth First”- or “Level Order”-Traversal. Meaning, before we descent (with a simple recursive algorithm) we need to go horizontally, resulting in 2 for loops. But no problem. Very simple.
For a simple output functionality, I have overwritten the inserter operator.
And, I do validate the input (I allow also lower letters).
Please see the full working example below.
#include <iostream>
#include <vector>
#include <utility>
#include <regex>
#include <map>
#include <iterator>
#include <string>
#include <algorithm>
//std::string test{"5ABCD1BCD1CDE"};
std::regex re1(R"(([a-zA-Z]+))");
std::regex re2(R"(([0-9]+[a-zA-Z]+)+)");
using Map = std::map<char, std::vector<char>>;
class Graph {
public:
// Constructor
Graph() : steps(), fromTo(), root(), ways(), track() {}
Graph(const std::string input) : steps(), fromTo(), root(), ways(), track() { init(input); }
// Build the result
void build() { int level{ 0 }; rBuild(root, level); }
// inserter
friend std::ostream& operator << (std::ostream& os, const Graph& g) {
std::copy(g.ways.begin(), g.ways.end(), std::ostream_iterator<std::string>(os, " "));
return os << "\n";
}
// extractor
friend std::istream& operator >> (std::istream& is, Graph& g) {
if (std::string input{}; std::getline(is, input)) g.init(input);
return is;
}
private:
// Values derived from input
int steps{};
Map fromTo{};
char root{};
// The result
std::vector<std::string> ways{};
std::string track{};
// Recursive function to build all ways
void rBuild(const char vertex, int& level);
// Initialize source values
void init(const std::string& input);
};
void Graph::init(const std::string& input) {
fromTo.clear(), ways.clear(); track.clear(); steps = 0; root = '\0';
if (std::regex_match(input, re2)) {
// Get steps
size_t pos{}; steps = std::stoi(input, &pos); std::string init = input.substr(pos);
// Split string into substrings
std::vector<std::string> split(std::sregex_token_iterator(init.begin(), init.end(), re1, 1), {});
// Get root
root = split[0][0]; track += root; ways.push_back(track);
// Convert substrings to map entries
std::transform(split.begin(), split.end(), std::inserter(fromTo, fromTo.end()), [](std::string & s) {
return std::make_pair(s[0], std::vector<char>(std::next(s.begin()), s.end())); });
}
else
std::cerr << "\n***** Error: Wrong input format\n";
}
// Recursive function to build all ways through the graph
void Graph::rBuild(const char vertex, int& level) {
// Allow only a certain depth, while descencing down
if (level < steps-1) {
// Search the start point for this entry
if (Map::iterator node{ fromTo.find(vertex) }; node != fromTo.end()) {
// Go through all edges to just the next vertex. This is not a breadth first traversal
// So, first we will go horizontally
for (const char to : node->second) {
// We want to track the way that we were going so far
track.push_back(to);
// Saving this track as a new way
ways.push_back(track);
// Restoring the origninal track befor this way, so that we can generate the next way
track.pop_back();
}
// and now we will descent
for (const char to : node->second) {
// One level further down
++level;
// track will be one vertex longer
track.push_back(to);
// Recursive call, descent
rBuild(to, level);
// And backwards
track.pop_back();
--level;
}
}
}
}
int main(int argc, char* argv[]) {
if (argc == 2) {
std::string test = argv[1];
// Define and initialize the graph
Graph graph(test);
// Build the required strings
graph.build();
// Show result
std::cout << graph;
}
else {
std::cout << "\nEnter init string: \n";
if (Graph graph; std::cin >> graph) {
// Build the required strings
graph.build();
// Show result
std::cout << "\nResult:\n" << graph << "\n";
}
}
return 0;
}
What a pity that nobody will read that . . .

Parsing a CSV file - C++

C++14
Generally, the staff in university has recommended us to use Boost to parse the file, but I've installed it and not succeeded to implement anything with it.
So I have to parse a CSV file line-by-line, where each line is of 2 columns, separated of course by a comma. Each of these two columns is a digit. I have to take the integral value of these two digits and use them to construct my Fractal objects at the end.
The first problem is: The file can look like for example so:
1,1
<HERE WE HAVE A NEWLINE>
<HERE WE HAVE A NEWLINE>
This format of file is okay. But my solution outputs "Invalid input" for that one, where the correct solution is supposed to print only once the respective fractal - 1,1.
The second problem is: The file can look like:
1,1
<HERE WE HAVE A NEWLINE>
1,1
This is supposed to be an invalid input but my solution treats it like a correct one - and just skips over the middle NEWLINE.
Maybe you can guide me how to fix these issues, it would really help me as I'm struggling with this exercise for 3 days from morning to evening.
This is my current parser:
#include <iostream>
#include "Fractal.h"
#include <fstream>
#include <stack>
#include <sstream>
const char *usgErr = "Usage: FractalDrawer <file path>\n";
const char *invalidErr = "Invalid input\n";
const char *VALIDEXT = "csv";
const char EXTDOT = '.';
const char COMMA = ',';
const char MINTYPE = 1;
const char MAXTYPE = 3;
const int MINDIM = 1;
const int MAXDIM = 6;
const int NUBEROFARGS = 2;
int main(int argc, char *argv[])
{
if (argc != NUBEROFARGS)
{
std::cerr << usgErr;
std::exit(EXIT_FAILURE);
}
std::stack<Fractal *> resToPrint;
std::string filepath = argv[1]; // Can be a relative/absolute path
if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
{
std::cerr << invalidErr;
exit(EXIT_FAILURE);
}
std::stringstream ss; // Treat it as a buffer to parse each line
std::string s; // Use it with 'ss' to convert char digit to int
std::ifstream myFile; // Declare on a pointer to file
myFile.open(filepath); // Open CSV file
if (!myFile) // If failed to open the file
{
std::cerr << invalidErr;
exit(EXIT_FAILURE);
}
int type = 0;
int dim = 0;
while (myFile.peek() != EOF)
{
getline(myFile, s, COMMA); // Read to comma - the kind of fractal, store it in s
ss << s << WHITESPACE; // Save the number in ss delimited by ' ' to be able to perform the double assignment
s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
getline(myFile, s, NEWLINE); // Read to NEWLINE - the dim of the fractal
ss << s;
ss >> type >> dim; // Double assignment
s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
if (ss.peek() != EOF || type < MINTYPE || type > MAXTYPE || dim < MINDIM || dim > MAXDIM)
{
std::cerr << invalidErr;
std::exit(EXIT_FAILURE);
}
resToPrint.push(FractalFactory::factoryMethod(type, dim));
ss.clear(); // Clear the buffer to update new values of the next line at the next iteration
}
while (!resToPrint.empty())
{
std::cout << *(resToPrint.top()) << std::endl;
resToPrint.pop();
}
myFile.close();
return 0;
}
You do not need anything special to parse .csv files, the STL containers from C++11 on provide all the tools necessary to parse virtually any .csv file. You do not need to know the number of values per-row you are parsing before hand, though you will need to know the type of value you are reading from the .csv in order to apply the proper conversion of values. You do not need any third-party library like Boost either.
There are many ways to store the values parsed from a .csv file. The basic "handle any type" approach is to store the values in a std::vector<std::vector<type>> (which essentially provides a vector of vectors holding the values parsed from each line). You can specialize the storage as needed depending on the type you are reading and how you need to convert and store the values. Your base storage can be struct/class, std::pair, std::set, or just a basic type like int. Whatever fits your data.
In your case you have basic int values in your file. The only caveat to a basic .csv parse is the fact you may have blank lines in between the lines of values. That's easily handled by any number of tests. For instance you can check if the .length() of the line read is zero, or for a bit more flexibility (in handling lines with containing multiple whitespace or other non-value characters), you can use .find_first_of() to find the first wanted value in the line to determine if it is a line to parse.
For example, in your case, your read loop for your lines of value can simply read each line and check whether the line contains a digit. It can be as simple as:
...
std::string line; /* string to hold each line read from file */
std::vector<std::vector<int>> values {}; /* vector vector of int */
std::ifstream f (argv[1]); /* file stream to read */
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
...
}
Above, each line is read into line and then line is checked on whether or not it contains digits. If so, parse it. If not, go get the next line and try again.
If it is a line containing values, then you can create a std::stringstream from the line and read integer values from the stringstream into a temporary int value and add the value to a temporary vector of int, consume the comma with getline and the delimiter ',', and when you run out of values to read from the line, add the temporary vector of int to your final storage. (Repeat until all lines are read).
Your complete read loop could be:
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
int itmp; /* temporary int */
std::vector<int> tmp; /* temporary vector<int> */
std::stringstream ss (line); /* stringstream from line */
while (ss >> itmp) { /* read int from stringstream */
std::string tmpstr; /* temporary string to ',' */
tmp.push_back(itmp); /* add int to tmp */
if (!getline (ss, tmpstr, ',')) /* read to ',' w/tmpstr */
break; /* done if no more ',' */
}
values.push_back (tmp); /* add tmp vector to values */
}
There is no limit on the number of values read per-line, or the number of lines of values read per-file (up to the limits of your virtual memory for storage)
Putting the above together in a short example, you could do something similar to the following which just reads your input file and then outputs the collected integers when done:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
int main (int argc, char **argv) {
if (argc < 2) { /* validate at least 1 argument given for filename */
std::cerr << "error: insufficient input.\nusage: ./prog <filename>\n";
return 1;
}
std::string line; /* string to hold each line read from file */
std::vector<std::vector<int>> values {}; /* vector vector of int */
std::ifstream f (argv[1]); /* file stream to read */
while (getline (f, line)) { /* read each line into line */
/* if no digits in line - get next */
if (line.find_first_of("0123456789") == std::string::npos)
continue;
int itmp; /* temporary int */
std::vector<int> tmp; /* temporary vector<int> */
std::stringstream ss (line); /* stringstream from line */
while (ss >> itmp) { /* read int from stringstream */
std::string tmpstr; /* temporary string to ',' */
tmp.push_back(itmp); /* add int to tmp */
if (!getline (ss, tmpstr, ',')) /* read to ',' w/tmpstr */
break; /* done if no more ',' */
}
values.push_back (tmp); /* add tmp vector to values */
}
for (auto row : values) { /* output collected values */
for (auto col : row)
std::cout << " " << col;
std::cout << '\n';
}
}
Example Input File
Using an input file with miscellaneous blank lines and two-integers per-line on the lines containing values as you describe in your question:
$ cat dat/csvspaces.csv
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
Example Use/Output
The resulting parse:
$ ./bin/parsecsv dat/csvspaces.csv
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
Example Input Unknown/Uneven No. of Columns
You don't need to know the number of values per-line in the .csv or the number of lines of values in the file. The STL containers handle the memory allocation needs automatically allowing you to parse whatever you need. Now you may want to enforce some fixed number of values per-row, or rows per-file, but that is simply up to you to add simple counters and checks to your read/parse routine to limit the values stored as needed.
Without any changes to the code above, it will handle any number of comma-separated-values per-line. For example, changing your data file to:
$ cat dat/csvspaces2.csv
1
2,2
3,3,3
4,4,4,4
5,5,5,5,5
6,6,6,6,6,6
7,7,7,7,7,7,7
8,8,8,8,8,8,8,8
9,9,9,9,9,9,9,9,9
Example Use/Output
Results in the expected parse of each value from each line, e.g.:
$ ./bin/parsecsv dat/csvspaces2.csv
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5
6 6 6 6 6 6
7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9
Let me know if you have questions that I didn't cover or if you have additional questions about something I did and I'm happy to help further.
I will not update your code. I look at your title Parsing a CSV file - C++ and would like to show you, how to read csv files in a more modern way. Unfortunately you are still on C++14. With C++20 or the ranges library it would be ultra simple using getlines and split.
And in C++17 we could use CTAD and if with initializer and so on.
But what we do not need is boost. C++`s standard lib is sufficient. And we do never use scanf and old stuff like that.
And in my very humble opinion the link to the 10 years old question How can I read and parse CSV files in C++? should not be given any longer. It is the year 2020 now. And more modern and now available language elements should be used. But as said. Everybody is free to do what he wants.
In C++ we can use the std::sregex_token_iterator. and its usage is ultra simple. It will also not slow down your program dramatically. A double std::getline would also be ok. Although it is not that flexible. The number of columns must be known for that. The std::sregex_token_iterator does not care about the number of columns.
Please see the following example code. In that, we create a tine proxy class and overwrite its extractor operator. Then we us the std::istream_iterator and read and parse the whole csv-file in a small one-liner.
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
#include <vector>
// Define Alias for easier Reading
// using Columns = std::vector<std::string>;
using Columns = std::vector<int>;
// The delimiter
const std::regex re(",");
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
std::string line;
cp.columns.clear();
if(std::getline(is, line) && !line.empty()) {
// Split values and copy into resulting vector
std::transform(
std::sregex_token_iterator(line.begin(), line.end(), re, -1), {},
std::back_inserter(cp.columns),
[](const std::string& s) { return std::stoi(s); });
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to
// std::vector<std::string>
operator Columns() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main() {
std::ifstream myFile("r:\\log.txt");
if(myFile) {
// Read the complete file and parse verything and store result into vector
std::vector<Columns> values(std::istream_iterator<ColumnProxy>(myFile), {});
// Show complete csv data
std::for_each(values.begin(), values.end(), [](const Columns& c) {
std::copy(c.begin(), c.end(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";
});
}
return 0;
}
Please note: There are tons of other possible solutions. Please feel free to use whatever you want.
EDIT
Because I see a lot of complicated code here, I would like to show a 2nd example of how to
Parsing a CSV file - C++
Basically, you do not need more than 2 statements in the code. You first define a regex for digits. And then you use a C++ language element that has been exactly designed for the purpose of tokenizing strings into substrings. The std::sregex_token_iterator. And because such a most-fitting language element is available in C++ since years, it would may be worth a consideration to use it. And maybe you could do basically the task in 2 lines, instead of 10 or more lines. And it is easy to understand.
But of course, there are thousands of possible solutions and some like to continue in C-Style and others like more moderen C++ features. That's up to everybodies personal decision.
The below code reads the csv file as specified, regardless of how many rows(lines) it contains and how many columns are there for each row. Even foreing characters can be in it. An empty row will be an empty entry in the csv vector. This can also be easly prevented, with an "if !empty" before the emplace back.
But some like so and the other like so. Whatever people want.
Please see a general example:
#include <algorithm>
#include <iterator>
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
// Test data. Can of course also be taken from a file stream.
std::stringstream testFile{ R"(1,2
3, a, 4
5 , 6 b , 7
abc def
8 , 9
11 12 13 14 15 16 17)" };
std::regex digits{R"((\d+))"};
using Row = std::vector<std::string>;
int main() {
// Here we will store all the data from the CSV as std::vector<std::vector<std::string>>
std::vector<Row> csv{};
// This extremely simple 2 lines will read the complete CSV and parse the data
for (std::string line{}; std::getline(testFile, line); )
csv.emplace_back(Row(std::sregex_token_iterator(line.begin(), line.end(), digits, 1), {}));
// Now, you can do with the data, whatever you want. For example: Print double the value
std::for_each(csv.begin(), csv.end(), [](const Row& r) {
if (!r.empty()) {
std::transform(r.begin(), r.end(), std::ostream_iterator<int>(std::cout, " "), [](const std::string& s) {
return std::stoi(s) * 2; }
); std::cout << "\n";}});
return 0;
}
So, now, you may get the idea, you may like it, or you do not like it. Whatever. Feel free to do whatever you want.

Input two matrices which didn't specialize size

I need to input two matrices with their sizes unfixed, using a blank row to declare the end of inputting each matrix.
For example, input:
1 2
3 4
(blank row here, end of input matrix 1)
5 6 7
8 9 10
(blank row here, end of input matrix 2)
will get a 2*2 matrix and a 2*3 matrix.
My current idea is to build a matrix large enough (like 1000*1000), then set loops and use cin to input each element (the code only shows how I input matrix 1):
int matx1[1000][1000];
for (i = 0;i < 1000;i++)
{
for (j = 0;j < 1000;j++)
{
temp = getchar();
if (temp == '\n')
{
mat1.col = j;
break;
}
else
{
putchar(temp);
}
cin>>matx1[i][j];
}
temp = getchar();
if (temp == '\n')
{
mat1.row = i;
break;
}
else
{
putchar(temp);
}
}
When I running this on Xcode, error happens, the putchar() function will interrupt my input in terminal by printing a number each time I press Enter, and the input result is in chaos.
I also tried the following code to avoid use of putchar():
for (i = 0; i < 1000; i++)
{
temp = getchar();
if (temp == '\n')
{
break;
}
else
{
matx1[i][0] = temp;
for (j = 1; j < 1000; j++)
{
cin >> matx1[i][j];
if (getchar() == '\n')
{
break;
}
}
}
}
Still, there are serious problems. The temp variable stores char and even if I convert it to int using ASCII, it works only if the first element of each line is smaller than 10, or the data of the first element of each line will be incorrectly stored.
So, the main question is:
How to switch to a new line to input the same matrix after press Enter once and switch to inputting the next matrix after press Enter again?
Or to say: how to get the event of '\n' without interfering with the original input stream?
To solve the problem at hand there is a more or less standard approach. You want to read csv data.
In your case, it is a little bit more difficult, because you do have a special format in your csv data. So first a " " separated list and then a empty line between 2 entries.
Now, how could this to be done? C++ is an object oriented language with many existing algorithms. You can create define a Proxy class and overwrite the extractor operator. The proxy class, and espcially the extractor, will do all the work.
The extractor, and that is the core of the question is, as said, a little bit more tricky. How can this be done?
In the extractor we will first read a complete line from an std::istream using the function std::getline. After having the line, we see a std::string containing "data-fields", delimited by a space. The std::string needs to be split up and the "data-fields"-contents shall be stored.
The process of splitting up strings is also called tokenizing. The "data-fields"-content is also called "token". C++ has a standard function for this purpose: std::sregex_token_iterator.
And because we have something that has been designed for such purpose, we should use it.
This thing is an iterator. For iterating over a string, hence sregex. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators a parameter, and copies the data between the first iterator and 2nd iterator to the std::vector.
The statement
std::vector token(std::sregex_token_iterator(line.begin(), line.end(), separator, -1), {});
defines a variable "token" of type std::vector<std::string>, splits up the std::string and puts the tokens into the std::vector. For your case we will use std::transform to change your strings into integers.
Very simple.
Next step. We want to read from a file. The file conatins also some kind of same data. The same data are rows.
And as for above, we can iterate over similar data. If it is the file input or whatever. For this purpose C++ has the std::istream_iterator. This is a template and as a template parameter it gets the type of data that it should read and, as a constructor parameter, it gets a reference to an input stream. It doesnt't matter, if the input stream is a std::cin, or a std::ifstream or a std::istringstream. The behaviour is identical for all kinds of streams.
And since we do not have files an SO, I use (in the below example) a std::istringstream to store the input csv file. But of course you can open a file, by defining a std::ifstream csvFile(filename). No problem.
We can now read the complete csv-file and split it into tokens and get all data, by simply defining a new variable and use again the range constructor.
Matrix matrix1( std::istream_iterator<ColumnProxy>(testCsv), {} );
This very simple one-liner will read the complete csv-file and do all the expected work.
Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").
Additionally, you can see that I do not use the "end()"-iterator explicitely.
This iterator will be constructed from the empty brace-enclosed initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
Ì hope I could answer your basic question. Please see the full blown C++ example below:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>
std::istringstream testCsv{ R"(1 2
3 4
5 6 7
8 9 10
)" };
// Define Alias for easier Reading
//using Columns = std::vector<std::string>;
using Columns = std::vector<int>;
using Matrix = std::vector<Columns>;
// The delimiter
const std::regex re(" ");
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
cp.columns.clear();
if (std::string line; std::getline(is, line)) {
if (!line.empty()) {
// Split values and copy into resulting vector
std::transform(std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(cp.columns),
[](const std::string & s) {return std::stoi(s); });
}
else {
// Notify the caller. End of matrix
is.setstate(std::ios::eofbit | std::ios::failbit);
}
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to std::vector<std::string>
operator Columns() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main()
{
// Define variable matrix with its range constructor. Read complete CSV in this statement, So, one liner
Matrix matrix1( std::istream_iterator<ColumnProxy>(testCsv), {} );
// Reset failbit and eofbit
testCsv.clear();
// Read 2nd matrix
Matrix matrix2(std::istream_iterator<ColumnProxy>(testCsv), {});
return 0;
}
Again:
What a pity that nobody will read this . . .

How to (elegantly) read a file whose columns have different types and store the columns appropriately?

I am learning about templates and wanted to solve the following tasks along the way: I would like to read a csv file whose columns have different types (string, int, etc.), store each column in a vector, and then access the vectors. Can somebody kindly point out how I can store the columns well?
For the time being, one example of a csv file the program might encounter looks like this:
first_column,second_column
int,string
1, line1
2, line2
The csv files will always have the column name in the first row, and the data types in the second row, followed by the actual data. However, the potential number of columns is unrestricted, as is its' ordering or its' types. Hence, another example might be
first_column,second_column,third_colum
string, double, string
foo, -19.8, mario
bar, 20.1, anna
Based on the second row, the program knows the datatype of the columns (it also knows the total number of columns from the first row) and it can allocate the appropriate memory.
I imagine the header file of the class solving the task looks as:
#include <fstream>
#include <string>
#include <vector>
class ColumnarCSV {
public:
ColumnarCSV(std::string filename) {read_data(filename);}
std::vector<std::string> get_names() { return column_names; }
std::vector<std::string> get_types() { return column_types; }
// pseudocode
template <typename T>
std::vector<T> get_column(std::string column_name) {
return column;
} //
private:
void read_data(std::string filename);
std::vector<std::string> column_names;
std::vector<std::string> column_types;
// storage for the columns;
};
The class ColumnarCSV is constructed with a string stating the location of the CSV file. Two public functions provide the column names and columns types, encoded in a vector<string>. The function get_column requires a column name and returns its' data. Please note that I do not know how to write this function. The return type can be different if necessary. Does somebody have an idea how to store the columns appropriately and the populate them at run-time depending on the column type?
What I tried so far:
Inheritance: I tried to work with a base class BaseColumn which contains a column name and datatype. The derived class template <typename T>ActualColumn: public BaseColumn contains the actual data. I wanted to access the data through a virtual function but learned that I cannot define virtual template functions.
Std:Variant: I was thinking of working with Std::variant and specify all possible types of the columns. However, I thought that there must be a way without resorting to c++17 innovations.
Create empty vector<vector<T>> for all contingencies: A brute-force idea would be to equip ColumnarCSV with member of vector<vector<T>> for all data types I can think of and populate them at run time. While this completed its' job, the code was very convoluted.
Is there a better way to solve define the class ColumnarCSV?
I think you are overcomplicating the problem. You do not really need templates and definitely you dont need inheritance or any form of type erasure when you always have a int and a string. If one row corresponds to one "entry" in the file, all you need is a
struct entry {
int id;
std::string x;
};
and an input operator
std::istream& operator>>(std::istream& in, entry& e) {
in >> e.id;
in >> e.x;
return in;
}
Now reading the entries is straightforward. To read a single line you do
std::ifstream file("file.name");
entry x;
file >> x;
I think you can store the data line by line as full std::string.
Knowing the types of the data, you will be able to easily convert the std::string into the real type (std::string, int, double, ...).For example, if you have a std::string that is a double in reality, you can use std::stod to convert it.
I have made an example to be more clear. Consider the following struct to handle the data:
typedef std::vector<std::string> StringVec;
struct FileData
{
StringVec col_names;
StringVec type_names;
StringVec data_lines;
bool loadData(const std::string & file_path);
bool getColumn(const std::string & col_name, StringVec & result);
};
The typedef is only here to simplify the code and make it more readable.
The method loadData() will read the file and store its content in the structure.col_names being the list of columns names, type_names being the list of types and data_lines being the list of the read lines.
The method getColumn() writes in the result argument the content of the desired column given in the col_name argument.
Those two methods return of boolean which indicates if the operation was successfully performed (true) or if an error occured (false).
loadData() may return false if the given file could not be opened or if it is corrupted.getColumn() may return false if the given column name does not exist.
A possible implementation of these methods could be:
#include <fstream>
// ========== ========== ========== ========== ==========
StringVec split(const std::string & s, char c)
{
StringVec splitted;
std::string word;
for(char ch : s)
{
if((ch == c) && (!word.empty()))
{
splitted.push_back(word);
word.clear();
}
else
word += ch;
}
if(!word.empty())
splitted.push_back(word);
return splitted;
}
void removeExtraSpaces(std::string & word)
{
while(!word.empty() && (word[0] == ' '))
word.erase(word.begin());
while(!word.empty() && (word[word.size()-1] == ' '))
word.erase(word.end()-1);
}
// ========== ========== ========== ========== ==========
bool FileData::loadData(const std::string & file_path)
{
bool success(false);
std::ifstream in_s(file_path);
if(in_s)
{
bool names_read(false);
bool types_read(false);
std::string line;
while(getline(in_s, line))
{
if(!names_read) // first line
{
col_names = split(line, ',');
if(col_names.empty())
return false; // FILE CORRUPTED
for(std::string & word : col_names)
removeExtraSpaces(word);
names_read = true;
}
else if(!types_read) // second line
{
type_names = split(line, ',');
if(type_names.size() != col_names.size())
{
col_names.clear();
type_names.clear();
return false; // FILE CORRUPTED
}
for(std::string & word : type_names)
removeExtraSpaces(word);
types_read = true;
}
else // other lines
{
if(split(line, ',').size() != col_names.size())
{
col_names.clear();
type_names.clear();
data_lines.clear();
return false; // FILE CORRUPTED
}
data_lines.push_back(line);
}
}
in_s.close();
success = true;
}
return success;
}
bool FileData::getColumn(const std::string & col_name, StringVec & result)
{
bool success(false);
bool contains(false);
size_t index(0);
while(!contains && (index < col_names.size()))
{
if(col_names[index] == col_name)
contains = true;
else
++index;
}
if(contains)
{
for(const std::string & line : data_lines)
{
std::string field(split(line, ',').at(index));
removeExtraSpaces(field);
result.push_back(field);
}
success = true;
}
return success;
}
// ========== ========== ========== ========== ==========
The functions split() and removeExtraSpaces() are defined to simplify the code (and make this example more readable).
From the user's side, this can be used as follows:
DataFile df;
bool loadSuccessful = df.loadData("data.txt"); // if true, df contains now the content of the file.
StringVec col;
bool columnFound = df.getColumn("col_name", col); // if true, col contains now the content of the desired column.
As you can see, very easy to use :)I know that at this point you have a vector of std::string but as the structure contains the names of the real type of each columns, you can convert what you got into the real type.Perhaps you can add a templated convert() method in the structure to make this imperceptible for the user.
I have made the tests with the following data files:
data.txt:
first_col, second_col
string, double
line1, 1.1
line2, -2.5
line3, 10.03
_other_data.txt:_
first_col, second_col, third_col
int, string, char
0, line1, a
5, line2, b
And it worked successfully for both.
I don't know if handling the data as std::string is elegant enough for you but I hope it can help you.

Trouble getting string to print random line from text file

I picked up this bit of code a while back as a way to select a random line from a text file and output the result. Unfortunately, it only seems to output the first letter of the line that it selects and I can't figure out why its doing so or how to fix it. Any help would be appreciated.
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
#include <time.h>
using namespace std;
#define MAX_STRING_SIZE 1000
string firstName()
{
string firstName;
char str[MAX_STRING_SIZE], pick[MAX_STRING_SIZE];
FILE *fp;
int readCount = 0;
fp = fopen("firstnames.txt", "r");
if (fp)
{
if (fgets(pick, MAX_STRING_SIZE, fp) != NULL)
{
readCount = 1;
while (fgets (str, MAX_STRING_SIZE, fp) != NULL)
{
if ((rand() % ++readCount) == 0)
{
strcpy(pick, str);
}
}
}
}
fclose(fp);
firstName = *pick;
return firstName;
}
int main()
{
srand(time(NULL));
int n = 1;
while (n < 10)
{
string fn = firstName();
cout << fn << endl;
++n;
}
system("pause");
}
firstName = *pick;
I am guessing this is the problem.
pick here is essentially a pointer to the first element of the array, char*, so of course *pick is of type char.. or the first character of the array.
Another way to see it is that *pick == *(pick +0) == pick[0]
There are several ways to fix it. Simplest is to just do the below.
return pick;
The constructor will automatically make the conversion for you.
Since you didn't specify the format of your file, I'll cover both cases: fixed record length and variable record length; assuming each text line is a record.
Reading Random Names, Fixed Length Records
This one is straight forward.
Determine the index (random) of the record you want.
Calculate the file position = record length * index.
Set file to the position.
Read text from file, using std::getline.
Reading Random Names, Variable Length Records
This assumes that the length of the text lines vary. Since they vary, you can't use math to determine the file position.
To randomly pick a line from a file you will either have to put each line into a container, or put the file offset of the beginning of the line into a container.
After you have your container establish, determine the random name number and use that as an index into the container. If you stored the file offsets, position the file to the offset and read the line. Otherwise, pull the text from the container.
Which container should be used? It depends. Storing the text is faster but takes up memory (you are essentially storing the file into memory). Storing the file positions takes up less room but you will end up reading each line twice (once to find the position, second to fetch the data).
Augmentations to these algorithms is to memory-map the file, which is an exercise for the reader.
Edit 1: Example
include <iostream>
#include <fstream>
#include <vector>
#include <string>
using std::string;
using std::vector;
using std::fstream;
// Create a container for the file positions.
std::vector< std::streampos > file_positions;
// Create a container for the text lines
std::vector< std::string > text_lines;
// Load both containers.
// The number of lines is the size of either vector.
void
Load_Containers(std::ifstream& inp)
{
std::string text_line;
std::streampos file_pos;
file_pos = inp.tellg();
while (!std::getline(inp, text_line)
{
file_positions.push_back(file_pos);
file_pos = inp.tellg();
text_lines.push_back(text_line);
}
}