C++ Data files, arrays, and calculations assignment - c++

I'm new to C++ and I'm running into an issue on one of my assignments. The goal is to load data from a data file that looks like this.
item number date quantity cost per each
1000 6/1/2018 2 2.18
1001 6/2/2018 3 4.44
1002 6/3/2018 1 15.37
1001 6/4/2018 1 4.18
1003 6/5/2018 7 25.2
Basically I need to do calculations the average item number used for each date using arrays and do some other calculations with the cost. I'm getting really hung up with loading the data from the file and manipulating it for equations. This is what I have so far.
#include <cmath> //for math operations
#include <iostream> //for cout
#include <cstdlib> //for compatibility
#include <fstream>
#include <string>
using namespace std;
int main()
{
string date;
int EOQ, rp;
int count;
int itemnum[][];
double quantity[][];
double cost[][];
ifstream myfile;
string filename;
cout << "Data File: " << endl;
cin >> filename; // user enters filename
myfile.open(filename.c_str());
if(myfile.is_open())
{
cout << "file opened" << endl;
string head;
while(getline(myfile, head))
{
break; // so header won't interfere with data
}
while(!myfile.eof())
{ // do this until reaching the end of file
int x,y;
myfile >> itemnum[x][y] >> date >> quantity[x][y] >> cost[x][y];
cout << "The numbers are:" << endl;
for(count = 0; count < y; count++)
{
cout << itemnum[x][y] << endl;
break;
}
//cout << "Item: Reorder Point: EOQ: " << endl;
//cout << itemnum << " " << rp << " " << EOQ << endl;
break;
}
}
else
{
cout << "" << endl; //in case of user error
cerr << "FILE NOT FOUND" << endl;
}
cout << endl;
cout << "---------------------------------------------" << endl;
cout << " End of Assignment A8" << endl;
cout << "---------------------------------------------" << endl;
cout << endl;
system("pause");
return 0;
I haven't started working with the equations yet since I still can't get the file loaded in a simple array!!!
Thank you!
Link for data file : https://drive.google.com/file/d/1QtAC1bu518PEnk4rXyIXFZw3AYD6OBAv/view?usp=sharing

When working on these kinds of problems I like to break these down into the parts related to parsing. I'm using some of the standard libraries to do some of the work for me. I also created a couple of structures to help keep the information of the data organized. As for your date, I could of left that as a single std::string but I chose to break the date down into three individual types themselves and store them into a data structure just to show the capabilities of one of the functions that is involved with parsing.
What I prefer doing is to get either a single line of data from a file and save that to a string, or get the entire contents of a file and save that either to a large buffer or a vector of strings, unless if I'm handling specific type of code where that is not applicable such as parsing a wav file. Then close the file handle as I'm done reading from it! Then after I have all of the information I need, instead of trying to parse the file directly while it is opened I'd rather parse a string as it is easier to parse. Then after parsing the string we can populate our data types that we need.
I had to modify your data file slightly to accommodate for the extra white spaces so I saved your file as a text file with only a single white space between each data type within a single line of text. I also did not include the first line (header) information as I just omitted it completely. However this should still act as a guide of how to design a good work flow for an application that has good readability, reusability, try to keep it portable and as generic as possible. Now, what you have been waiting for; the demonstration of my version of your code:
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <exception>
struct Date {
int month;
int day;
int year;
Date() = default;
Date( int monthIn, int dayIn, int yearIn ) :
month( monthIn ),
day( dayIn ),
year( yearIn )
{}
};
struct DataSheetItem {
int itemNumber;
Date date;
int quantity;
double costPerEach;
DataSheetItem() = default;
DataSheetItem( int itemNumberIn, Date& dateIn, int quantityIn, double costPerEachIn ) :
itemNumber( itemNumberIn ),
date( dateIn ),
quantity( quantityIn ),
costPerEach( costPerEachIn )
{}
};
std::vector<std::string> splitString( const std::string& s, char delimiter ) {
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream( s );
while( std::getline( tokenStream, token, delimiter ) ) {
tokens.push_back( token );
}
return tokens;
}
void getDataFromFile( const char* filename, std::vector<std::string>& output ) {
std::ifstream file( filename );
if( !file ) {
std::stringstream stream;
stream << "failed to open file " << filename << '\n';
throw std::runtime_error( stream.str() );
}
std::string line;
while( std::getline( file, line ) ) {
if ( line.size() > 0 )
output.push_back( line );
}
file.close();
}
DataSheetItem parseDataSheet( std::string& line ) {
std::vector<std::string> tokens = splitString( line, ' ' ); // First parse with delimeter of a " "
int itemNumber = std::stoi( tokens[0] );
std::vector<std::string> dateInfo = splitString( tokens[1], '/' );
int month = std::stoi( dateInfo[0] );
int day = std::stoi( dateInfo[1] );
int year = std::stoi( dateInfo[2] );
Date date( month, day, year );
int quantity = std::stoi( tokens[2] );
double cost = std::stod( tokens[3] );
return DataSheetItem( itemNumber, date, quantity, cost );
}
void generateDataSheets( std::vector<std::string>& lines, std::vector<DataSheetItem>& dataSheets ) {
for( auto& l : lines ) {
dataSheets.push_back( parseDataSheet( l ) );
}
}
int main() {
try {
std::vector<std::string> fileConents;
getDataSheetItemsFromFile( "test.txt", fileContents );
std::vector<DataSheetItem> data;
generateDataSheets( fileConents, data );
// test to see if info is correct
for( auto& d : data ) {
std::cout << "Item #: " << d.itemNumber << " Date: "
<< d.date.month << "/" << d.date.day << "/" << d.date.year
<< " Quantity: " << d.quantity << " Cost: " << d.costPerEach << '\n';
}
} catch( const std::runtime_error& e ) {
std::cerr << e.what() << '\n';
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
NOTE This will not work with how your file currently is; this does not account for the first line of text (header information) and this does not account for any extra white spaces in between the data fields. If you add a single line of text when opening the file and read in a single line and just ignore it, then perform the loop to get all strings to add to vector to return back; your vectors will have the information in it but they will not be at the correct index locations of the vector because of all the extra white spaces. This is something you need to be aware of! Other than that; this is how I would basically design a program or application to parse data. This is by all means not 100% full proof and may not even be 100% bug free, but from a quick glance and running it through my debugger a few times it does appear to be without any noticeable bugs. There could also be some room for improvements for runtime efficiency, etc. but this is just a generalization of basic parsing.

Related

How to read into Nested Structure C++

I am trying to read into a nested struct from a txt file. The output keeps repeating the same nested output. I attempted to nest two for loop but it didn't even read at all, my display was a blank screen. So I was able to get the file to read now but it repeats the same title and yearPub information for all entries.
#include<iostream>
#include<string>
#include<fstream>
#include<iomanip>
using namespace std;
struct Discography {
string title;
int yearPub;
};
struct Collection {
string name;
string genres;
int startYear;
int endYear;
Discography records;
};
void readFile(ifstream&, string, Collection [], int&);
const int DATA_FILE = 10;
int main() {
ifstream inputMusic;
Collection music[DATA_FILE];
Discography records;
const int DISPLAY_ALL = 1,
SEARCH_ARTIST_NAME = 2,
SEARCH_GENRE = 3,
SEARCH_TITLE = 4,
SEARCH_YEAR = 5,
QUIT_CHOICE = 6;
int choice;
int count = 0;
int numFile = count;
string nameArtist,results;
readFile(inputMusic, "My_Artists.txt", music, count);
void readFile(ifstream& inputMusic, string data, Collection music[], int &count)
{
inputMusic.open(data);
if (!inputMusic)
{
cout << "Error in opening file\n";
exit(1);
}
else
{
while (!inputMusic.eof())
{
inputMusic >> music[count].name
>> music[count].genres
>> music[count].startYear
>> music[count].endYear
>> music[count].records.title
>> music[count].records.yearPub;
count++;
}
inputMusic.close();
}
return;
};
InputFile:
MJ
Pop
1980
2020
BAD 1990
DRE
Rap
1970
2022
CRONIC 1995
EMINEM
Rap
1998
2022
ENCORE 2002
WHITNEY
R&B
1974
2008
SOMEBODY 1987
OUTPUT:
Name : MJ
Genre: Pop
Start Year: 1980
End Year: 2020
Title: BAD Year Published: 1990
----------------------------------------------------
Name : DRE
Genre: Rap
Start Year: 1970
End Year: 2022
Title: BAD Year Published: 1990
----------------------------------------------------
Name : EMINEM
Genre: Rap
Start Year: 1998
End Year: 2022
Title: BAD Year Published: 1990
----------------------------------------------------
Name : WHITNEY
Genre: R&B
Start Year: 1974
End Year: 2008
Title: BAD Year Published: 1990
----------------------------------------------------
I would suggest going in a little different direction to make sure you don't write a lot of unneeded code. Here's the exact same example withe a few improvements. First improvement is the way you store your data. Instead of storing it in an array with seperate size variables and more arguments and that why not use vector. Next thing is the way you input data into your program. Not very clear. Better way would be saving it in csv file. Here's the code:
#include<iostream>
#include<string>
#include<fstream>
#include<iomanip>
#include<vector>
using namespace std;
struct Discography {
string title;
int yearPub;
Discography(string name, int publish) : title(name), yearPub(publish) {}
Discography() {}
};
struct Collection {
string name;
string genres;
int startYear;
int endYear;
Discography records;
Collection(string artistName, string genre, int start, int end, string title, int yearPublished)
: name(artistName), genres(genre), startYear(start), endYear(end), records(title, yearPublished) {}
Collection() {}
};
void readFile(vector<Collection> &music)
{
ifstream file;
file.open("myArtists.txt");
if (file.is_open())
{
string line;
while (getline(file,line))
{
int pos;
pos = line.find(';');
string name = line.substr(0, pos);
line = line.substr(pos+1);
pos = line.find(';');
string genre = line.substr(0, pos);
line = line.substr(pos+1);
pos = line.find(';');
int start = stoi(line.substr(0, pos));
line = line.substr(pos+1);
pos = line.find(';');
int end = stoi(line.substr(0, pos));
line = line.substr(pos+1);
pos = line.find(';');
string title = line.substr(0, pos);
int publish = stoi(line.substr(pos+1));
music.emplace_back(Collection(name,genre,start,end,title,publish));
}
}
else
{
cout << "Error in opening file" << endl;
}
file.close();
return;
};
void print(vector<Collection> &music)
{
for(int i = 0; i < music.size(); i++)
{
cout << "Name: " << music[i].name << endl
<< "Genre: " << music[i].genres << endl
<< "Start Year: " << music[i].startYear << endl
<< "End Year: " << music[i].endYear << endl
<< "Title: " << music[i].records.title << " Year Publiszed: " << music[i].records.yearPub << endl
<< "----------------------------------------------------" << endl << endl;
}
}
int main() {
vector<Collection> music;
readFile(music);
print(music);
}
There are constructors for the structs enabling quick and easy one line creation of the objects. This coupled with vectors' dynamic size and implemented add function emplace back makes it really easy to add to the list. Now the only thing left is to gather the data from the file. To make it more clear its always best to use a csv (comma seperated value) type file where each line is it's own object/item in your list and each variable is seperated in this instance with a colon. You go through the entire file with the while(getline(file, line)) loop seperating the values from the colons in an easy implemented string function find(character) which returns you the position of the colons in that single line of text from your file. After you find and seperate your data, make sure your numbers are numbers and strings are string. you can switch string to number via stoi function (String TO Integer). All your data is stored in music Collection.
The Output:
Like #Monogeon I would use std::vector<> over an array for storage.
But I think the more idiomatic way to read and write objects in C++ is to define the input and output operators for a class. That way you can stream the objects to not only files but other stream like objects.
This then allows you to use further C++ idioms to manipulate the streams (like std::istream_iterator which reads objects using the input operator (i.e. operator>>).
I added the input and output operators below for you.
Then I create a facade FancyPrintCollection to make printing a collection object in a nice way for a human.
#include <iostream>
#include <string>
#include <fstream>
#include <iomanip>
#include <vector>
#include <iterator>
struct Discography {
std::string title;
int yearPub;
friend std::ostream& operator<<(std::ostream& stream, Discography const& src)
{
return stream << src.title << " " << src.yearPub;
}
friend std::istream& operator>>(std::istream& stream, Discography& dst)
{
// The problem is that "title" could be multiple words.
// We just know that we have a title followed by a number.
// For V1 lets assume "title" is one word
return stream >> dst.title >> dst.yearPub;
}
};
struct Collection {
std::string name;
std::string genres;
int startYear;
int endYear;
Discography records;
friend std::ostream& operator<<(std::ostream& stream, Collection const& src)
{
return stream << src.name << "\n"
<< src.genres << "\n"
<< src.startYear << "\n"
<< src.endYear << "\n"
<< src.records << "\n";
}
friend std::istream& operator>>(std::istream& stream, Collection& dst)
{
std::getline(stream, dst.name);
std::getline(stream, dst.genres);
std::string numberLine;
if (std::getline(stream, numberLine)) {
dst.startYear = std::stoi(numberLine);
}
if (std::getline(stream, numberLine)) {
dst.endYear = std::stoi(numberLine);
}
stream >> dst.records;
stream.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return stream;
}
};
struct FancyPrintCollection
{
Collection const& collection;
public:
FancyPrintCollection(Collection const& collection)
: collection(collection)
{}
friend std::ostream& operator<<(std::ostream& stream, FancyPrintCollection const& src)
{
stream << "Name: " << src.collection.name << "\n"
<< "Genre: " << src.collection.genres << "\n"
<< "Start Year: " << src.collection.startYear << "\n"
<< "End Year: " << src.collection.endYear << "\n"
<< "Title: " << src.collection.records.title << " Year Publiszed: " << src.collection.records.yearPub << "\n"
<< "----------------------------------------------------" << "\n" << "\n";
return stream;
}
};
int main()
{
std::ifstream inputMusic("Music.data");
std::vector<Collection> music(std::istream_iterator<Collection>(inputMusic), std::istream_iterator<Collection>{});
for (auto const& m: music) {
std::cout << FancyPrintCollection(m);
}
}

Assertion failure when writing to a text file using ofstream

I am trying to write some string data to a .txt file that i read from the user but after doing so, the program shuts down instead of continuing and when i check the results inside the .txt file i see some part of the data and then some gibberish, followed by an assertion failure error! Here's the code:
#include "std_lib_facilities.h"
#include <fstream>
using namespace std;
using std::ofstream;
void beginProcess();
string promptForInput();
void writeDataToFile(vector<string>);
string fileName = "links.txt";
ofstream ofs(fileName.c_str(),std::ofstream::out);
int main() {
// ofs.open(fileName.c_str(),std::ofstream::out | std::ofstream::app);
beginProcess();
return 0;
}
void beginProcess() {
vector<string> links;
string result = promptForInput();
while(result == "Y") {
for(int i=0;i <= 5;i++) {
string link = "";
cout << "Paste the link skill #" << i+1 << " below: " << '\n';
cin >> link;
links.push_back(link);
}
writeDataToFile(links);
links.clear(); // erases all of the vector's elements, leaving it with a size of 0
result = promptForInput();
}
std::cout << "Thanks for using the program!" << '\n';
}
string promptForInput() {
string input = "";
std::cout << "Would you like to start/continue the process(Y/N)?" << '\n';
std::cin >> input;
return input;
}
void writeDataToFile(vector<string> links) {
if(!ofs) {
error("Error writing to file!");
} else {
ofs << "new ArrayList<>(Arrays.AsList(" << links[0] << ',' << links[1] << ',' << links[2] << ',' << links[3] << ',' << links[4] << ',' << links[5] << ',' << links[6] << ',' << "));\n";
}
}
The problem lies probably somewhere in the ofstream writing procedure but i can't figure it out. Any ideas?
You seem to be filling a vector of 6 elemenents, with indices 0-5, however in your writeDataToFile function are dereferencing links[6] which is out of bounds of your original vector.
Another thing which is unrelated to your problem, but is good practice:
void writeDataToFile(vector<string> links)
is declaring a function which performs a copy of your vector. Unless you want to specifically copy your input vector, you most probably want to pass a const reference, like tso:
void writeDataToFile(const vector<string>& links)

Reading or writing binary file incorrectly

The output of the code show gibberish values for all the variables of the Student struct. When the display function is ran.
I've include the relevant code in each of the add and display function for the binary file.
For the second function, does the seekg pointer automatically move to read the the next record each time the for loop runs?
//Student struct
struct Student
{
char name [30];
float labTest;
float assignments;
float exam;
};
//Writing function
afile.open(fileName,ios::out|ios::binary);
Student S;
strcpy(S.name,"test");
S.labTest = rand()%100+1;
S.assignments = rand()%100+1;
S.exam = rand()%100+1;
afile.write(reinterpret_cast<char*>(&S),sizeof(S));
afile.close();
//Reading function
afile.open(fileName,ios::in|ios::binary);
afile.seekg(0,ios::end);
int nobyte = afile.tellg();
int recno = nobyte / sizeof(Student);
Student S;
//Loop and read every record
for(int i = 0;i<recno;i++)
{
afile.read(reinterpret_cast<char*>(&S),sizeof(S));
cout << "Name of Student: " << S.name << endl
<< "Lab mark: " << S.labTest << endl
<< "Assignment mark: " << S.assignments << endl
<< "Exam mark: " << S.exam << endl << endl;
}
afile.close();
There are a lot of problems with your code:
Calling your write function will permanently overwrite the last written data set. You have to add: ios::append, so that new data will be written behind the last data you wrote before.
After you move with afile.seekg(0,ios::end); to get with tellg the file size, you have to go back to the start of the file before reading with afile.seekg(0,ios::beg)
It looks that you use a char array to store a string. This is not c++ style! And it is dangerous how you use it. If you use strcpy, you can copy a string which is longer than the space you reserved for it. So you should prefer std::string for that. But you can't simply write a struct which constains std::string as binary! To get checked copy you can use strncpy, but that is still not c++ ;)
For the second function, does the seekg pointer automatically move to read the the next record each time the for loop runs?
Yes, the file position moves which each successful read and write.
A general remark writing binary data by simply dumping memory content:
That is not a good idea, because you can only read that data back, if you use the same machine type and the same compiler options. That means: A machine with different endianness will read data totally corrupted. Also a different integer type ( 32 bit vs 64 bit ) will break that code!
So you should invest some time how to serialize data in a portable way. There are a lot of libraries around which can be used to read/write also complex data types like std::string or container types.
A hint using SO:
Please provide code which everybody can simply cut and paste and compiled. I did not know what your Student struct is. So I take a lot of assumptions! Is your struct really using char[]? We don't know!
#include <iostream>
#include <fstream>
#include <cstring>
const char* fileName="x.bin";
struct Student
{
char name[100]; // not c++ style!
int labTest;
int assignments;
int exam;
};
// Writing function
void Write()
{
std::ofstream afile;
afile.open(fileName,std::ios::out|std::ios::binary|std::ios::app);
Student S;
strcpy(S.name,"test"); // should not be done this way!
S.labTest = rand()%100+1;
S.assignments = rand()%100+1;
S.exam = rand()%100+1;
afile.write(reinterpret_cast<char*>(&S),sizeof(S));
afile.close();
}
void Read()
{
//Reading function
std::ifstream afile;
afile.open(fileName,std::ios::in|std::ios::binary);
afile.seekg(0,std::ios::end);
int nobyte = afile.tellg();
int recno = nobyte / sizeof(Student);
afile.seekg(0, std::ios::beg);
Student S;
//Loop and read every record
for(int i = 0;i<recno;i++)
{
afile.read(reinterpret_cast<char*>(&S),sizeof(S));
std::cout << "Name of Student: " << S.name << std::endl
<< "Lab mark: " << S.labTest << std::endl
<< "Assignment mark: " << S.assignments << std::endl
<< "Exam mark: " << S.exam << std::endl << std::endl;
}
afile.close();
}
int main()
{
for ( int ii= 0; ii<10; ii++) Write();
Read();
}
EDIT. Apparently, I was a bit too late in responding. Klaus has compiled a better, more comprehensive response dwelling into other problems regarding C-style char [], std::string and the endianness of the platform.
You should append to the file opened for every record. In your code you don't have this, at all. Please write the code in a way we can copy and paste, and test. As a working example, you should write some code that can be compiled and run as below:
#include <algorithm>
#include <cstring>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
// Student struct
struct Student {
char name[30];
float labTest;
float assignments;
float exam;
};
// Serializer
void serialize_student(const Student &s, const std::string &filename) {
// Append to the file, do not overwrite it
std::ofstream outfile(filename, std::ios::binary | std::ios::app);
if (outfile)
outfile.write(reinterpret_cast<const char *>(&s), sizeof(Student));
}
// Deserializer
std::vector<Student> deserialize_students(const std::string &filename) {
std::ifstream infile(filename, std::ios::binary);
std::vector<Student> students;
Student s;
while (infile.read(reinterpret_cast<char *>(&s), sizeof(Student)))
students.push_back(std::move(s));
return std::move(students);
}
int main(int argc, char *argv[]) {
// Generate records
std::vector<Student> mystudents;
std::generate_n(std::back_inserter(mystudents), 10, []() {
Student s;
std::strcpy(s.name, "test");
s.labTest = rand() % 100 + 1;
s.assignments = rand() % 100 + 1;
s.exam = rand() % 100 + 1;
return s;
});
// Print and write the records
for (const auto &student : mystudents) {
std::cout << student.name << ": [" << student.labTest << ','
<< student.assignments << ',' << student.exam << "].\n";
serialize_student(student, "students.bin");
}
// Read and print the records
auto records = deserialize_students("students.bin");
std::cout << "===\n";
for (const auto &student : records)
std::cout << student.name << ": [" << student.labTest << ','
<< student.assignments << ',' << student.exam << "].\n";
return 0;
}

C++ - Reading Columns of a CSV files and only keeping ones that start with a specific string

so I am trying to figure out how to sort CSV files to help organize data that I need for an economics paper. The files are massive and there are a lot of them (about 587 mb of zipper files). The files are organized by columns in that all the variable names are in the first line and all the data for that variable is all below it. My goal is to be able to only take the columns that start with the an indicated string (ex input: "MC1", Get: MC10RT2,MC1WE02,...) and then save them into a separate file. Does anyone have any advice as to what the form that the code should take?
Just for fun a small program that should work for you. The thing you'll be intersted in is boost::split(columns, str, boost::is_any_of(","), boost::token_compress_off); that here create a vector of string from your csv-style string.
Very basic example, but your question was an excuse to play a bit with boost string algorithms, that I did know but never used...
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <set>
// Typedefs for eye candy
typedef std::vector<std::string> Fields;
typedef std::vector<Fields> Results;
typedef std::set<unsigned long> Columns;
// Split the CSV string to a vector of string
Fields split_to_fields(const std::string& str)
{
Fields columns;
boost::split(columns, str, boost::is_any_of(","),
boost::token_compress_off);
return columns;
}
// Read all the wanted columns
Results read_columns_of_csv(std::istream& stream, const Columns& wanted_columns)
{
std::string str;
Results results;
while (getline(stream, str))
{
Fields line{split_to_fields(str)};
Fields fields;
for (unsigned long wanted_column: wanted_columns)
{
if (line.size() < wanted_column)
{
std::cerr << "Line " << (results.size() + 1 )
<< " does not contain enough fields: "
<< line.size() << " < " << wanted_column
<< std::endl;
}
else
{
fields.push_back(line[wanted_column]);
}
}
results.push_back(fields);
}
return results;
}
// Read the ids of the columns you want to get
Columns read_wanted_columns(unsigned long max_id)
{
Columns wanted_columns;
unsigned long column;
do
{
std::cin >> column;
if ((column < max_id)
&& (column > 0))
{
wanted_columns.insert(column - 1);
}
}
while (column > 0);
return wanted_columns;
}
// Whole read process (header + columns)
Results read_csv(std::istream& stream)
{
std::string str;
if (!getline(stream, str))
{
std::cerr << "Empty file !" << std::endl;
return Results{};
}
// Get the column name
Fields columns{split_to_fields(str)};
// Output the column with id
unsigned long column_id = 1;
std::cout
<< "Select one of the column by entering its id (enter 0 to end): "
<< std::endl;
for (const std::string elem: columns)
{
std::cout << column_id++ << ": " << elem << std::endl;
};
// Read the choosen cols
return read_columns_of_csv(stream, read_wanted_columns(column_id));
}
int main(int argc, char* argv[])
{
// Manage errors for filename
if (argc < 2)
{
std::cerr << "Please specify a filename" << std::endl;
return -1;
}
std::ifstream file(argv[1]);
if (!file)
{
std::cerr << "Invalid filename: " << argv[1] << std::endl;
return -2;
}
// Process
Results results{read_csv(file)};
// Output
unsigned long line = 1;
std::cout << "Results: " << results.size() << " lines" << std::endl;
for (Fields fields: results)
{
std::cout << line++ << ": ";
std::copy(fields.begin(), fields.end(),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
return 0;
}
I suggest using a vector of structures.
The structure will allow each row to have a different type.
Your program would take on the following structure:
Read data into a the vector.
Extra necessary fields out of each structure in the vector and write
to new file.
Close all files.

Storing data from an unknown number of files

I have used the following piece of code to read from multiple .dat files and parse them. This code uses 3D vectors to store data after the reading process. However, I would like that the data corresponding to each single file be independent from the others. The issue is that the number of files varies, and is unknown at compile time; hence, the number of vectors varies too. I would like to know if there is any solution for this.
vector<vector<vector<string>>> masterList;
for (int i = 0; i < files.size(); ++i) {
cout << "file name: " << files[i] << endl;
fin.open(files[i].c_str());
if (!fin.is_open()) {
// error occurs!!
// break or exit according to your needs
cout<<"error"<<endl;
}
std::vector<vector<string>> tokens;
int current_line = 0;
std::string line;
while (std::getline(fin, line))
{
cout<<"line number: "<<current_line<<endl;
// Create an empty vector for this line
tokens.push_back(vector<string>());
//copy line into is
std::istringstream is(line);
std::string token;
int n = 0;
//parsing
while (getline(is, token, DELIMITER))
{
tokens[current_line].push_back(token);
cout<<"token["<<current_line<<"]["<<n<<"] = " << token <<endl;
n++;
}
cout<<"\n";
current_line++;
}
fin.clear();
fin.close();
masterList.push_back(tokens);
}
So, the main issue I'm facing is: how to create a variable number of 2D vectors to store the data corresponding to each single file, when I don't know how many files there are at compile time.
Modify the list of files in the main to adapt the size of your "master data". If the length of file names is variable, then parse it first (or get it one way or another first), and then execute the parsing on the dat files. If the filenames are known at run time only, and asynchronously with that, then add a new element in the list each time you get a new filename (you can use events for that for example, take a look at https://github.com/Sheljohn/siglot).
Note that list elements are independent in memory, and that lists support deletion/insertion in constant time. That way, data corresponding to each file is independent from the other. If you want to retrieve the data specific to a file (knowing the filename), either iterate on the list to find the corresponding file (linear time) or trade the list for an unordered_map (amortized constant time).
#include <string>
#include <list>
#include <vector>
#include <iostream>
#include <sstream>
#include <fstream>
#include <iterator>
#include <algorithm>
using namespace std;
#define AVG_LINES_PER_FILE 100
/**
* [tokenize_string Tokenize input string 'words' and put elements in vector 'tokens'.]
* #param words [Space separated data-string.]
* #param tokens [Vector of strings.]
*/
void tokenize_string( string& words, vector<string>& tokens )
{
unsigned n = count( words.begin(), words.end(), ' ' );
tokens.reserve(n);
istringstream iss(words);
copy(
istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter<vector<string> >(tokens)
);
}
/**
* Contains data parsed from a single .dat file
*/
class DATFileData
{
public:
typedef vector<string> line_type;
typedef vector<line_type> data_type;
DATFileData( const char* fname = nullptr )
{
m_fdata.reserve(AVG_LINES_PER_FILE);
m_fdata.clear();
if ( fname ) parse_file(fname);
}
// Check if the object contains data
inline operator bool() const { return m_fdata.size(); }
// Parse file
bool parse_file( const char* fname )
{
string line;
m_fdata.clear();
ifstream fin( fname );
if ( fin.is_open() )
{
while ( fin.good() )
{
getline(fin,line);
m_fdata.push_back(line_type());
tokenize_string( line, m_fdata.back() );
}
fin.close();
m_fname = fname;
cout << "Parsed " << m_fdata.size() << " lines in file '" << fname << "'." << endl;
return true;
}
else
{
cerr << "Could not parse file '" << fname << "'!" << endl;
return false;
}
}
// Get data
inline unsigned size() const { return m_fdata.size(); }
inline const char* filename() const { return m_fname.empty() ? nullptr : m_fname.c_str(); }
inline const data_type& data() const { return m_fdata; }
inline const line_type& line( const unsigned& i ) const { return m_fdata.at(i); }
private:
string m_fname;
data_type m_fdata;
};
int main()
{
unsigned fcount = 0;
vector<string> files = {"some/file/path.dat","another/one.dat"};
list<DATFileData> data(files.size());
for ( DATFileData& d: data )
d.parse_file( files[fcount++].c_str() );
cout << endl << files.size() << " files parsed successfully." << endl;
}