simple CSV parser (C++) to deal with commas inside quotes - c++

Seems to be a perenial issue, CSVs. In my case, I have data like this:
"Incident Number","Incident Types","Reported Date","Nearest Populated Centre"
"INC2008-008","Release of Substance","01/29/2008","Fort Nelson"
"INC2008-009","Release of Substance, Adverse Environmental Effects","01/29/2008","Taylor"
I built a parser that makes it into a lovely vector<vector<string>>:
string message = "Loading CSV File...\n";
genericMessage(message);
vector<vector<string>> content;
vector<string> row;
string line, word, block;
vector<string> incidentNoVector;
fstream file(fname, ios::in);
if (file.is_open())
{
while (getline(file, line))
{
row.clear();
stringstream str(line);
while (getline(str, word, ','))
row.push_back(word);
content.push_back(row);
}
}
else
cout << "Could not open the file\n";
but didn't notice the extra comma in some of the data (row 3). Any ideas? I've already built a huge amount of code based on the original vector<vector<string>> expected output, so I really can't afford to change that.
Once I've gotten the vector, I strip the first line out (the header) and place it in it's own object, then put all the remaining rows in a separate object that I can call using [][].
// Place the header information in an object, then remove it from the vector
Data_Headers colHeader;
colHeader.setColumn_headers(content[0]);
content.erase(content.begin());
// Place the row data in an object.
Data_Rows allData;
allData.setColumn_data(content);
Row_Key incidentNumbers;
for (int i = 0; i < allData.getColumn_data().size(); i++)
{
incidentNoVector.push_back(allData.getColumn_data()[i][0]);
}
incidentNumbers.setIncident_numbers(incidentNoVector);
Any help would be hugely appreciated!

If you don't want to use a ready CSV parser library you could create a class that stores the values in one row and overload operator>> for that class. Use std::quoted when reading the individual fields in that operator.
Example:
struct Eater { // A class mainly used for eating up commas in an istream
char ch;
};
std::istream& operator>>(std::istream& is, const Eater& e) {
char ch;
if(is.get(ch) && ch != e.ch) is.setstate(std::istream::failbit);
return is;
}
struct Row {
std::string number;
std::string types;
std::string date;
std::string nearest;
};
// Read one Row from an istream
std::istream& operator>>(std::istream& is, Row& r) {
Eater comma{','};
is >> std::quoted(r.number) >> comma >> std::quoted(r.types) >> comma
>> std::quoted(r.date) >> comma >> std::quoted(r.nearest) >> Eater{'\n'};
return is;
}
// Write one Row to an ostream
std::ostream& operator<<(std::ostream& os, const Row& r) {
os << std::quoted(r.number) << ',' << std::quoted(r.types) << ','
<< std::quoted(r.date) << ',' << std::quoted(r.nearest) << '\n';
return os;
}
After you've opened the file, you could then create and populate a std::vector<Row> in a very simple way:
Row heading;
if(file >> heading) {
std::vector<Row> rows(std::istream_iterator<Row>(file),
std::istream_iterator<Row>{});
// All Rows are now in the vector
}
Demo

Related

reading comma-seprated txt file in c++ and putting each column in an array [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 3 months ago.
c++ question
Hi I have a txt file seprated with comma. It is a database for school. the txt file looks like this:
AI323,12,soma gomez,Wed,department of art AM324,6,tony wang,Tue;Thu, CC+ dance school
I want my code to read the txt file and put each column in an array/vector. the result should be like:
class_code={} ; num_students={}; teacher_name={}; date={};location={}
Thnak you in advance for your hepl. Im very new to cpp.
I tried to put each of them in an array with getline command. howver, I am struggling with getting the result.
A very common used approach to read records from a CSV-Database is to read reading line by line from the file with std::getline. Plese read here about this function.
Then this line is again put in a stream, the std::istringstream.
With that it is possible to use again formatted or unformatted stream extraction functions, like >> or std::getline, to extract parts from the string (stingstream).
And here especially useful is the std::getline function with delimiter. You can then extract parts of a string until a certain character is found, in your case the comma (',').
Then you need to use modern C++ elements to implement a solution. Data and methods operating on this data are put in a class or struct.
And to design the reality into your program, you would need a data type (struct) "Record" and a data type (struct) "Database".
And then things will evolve.
Please see one example below. You can build your solution on this proposal:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <iomanip>
// One record, so one row in the database
struct Record{
std::string ID{};
int no{};
std::string name{};
std::string day{};
std::string department{};
int number{};
std::string teacher{};
std::string days{};
std::string location{};
// For reading one record in SCV format from a stream with the C++ extraction operator >>
friend std::istream& operator >> (std::istream& is, Record& r) {
// We will read a complete line and check, if this was OK
std::string line{}; char c;
if (std::getline(is, line)) {
// Now we will put this line into a std::istringstream, so that we can extract
// parts of it using stream functions. Basically again std::getline
std::istringstream iss{ line };
// and now we tryt to extract all parts of the record from the std::istringstream
std::getline(iss, r.ID,',');
iss >> r.no >> c;
std::getline(iss>> std::ws, r.name, ',');
std::getline(iss, r.day, ',');
std::getline(iss, r.department, ',');
iss >> r.number >> c;
std::getline(iss >> std::ws, r.teacher, ',');
std::getline(iss, r.days, ',');
std::getline(iss, r.location, ',');
}
return is;
}
// For writing one record to a stream with the C++ inserter operator >> in CSV format
friend std::ostream& operator << (std::ostream& os, const Record& r) {
return os << r.ID << ',' << r.no << ',' << r.name << ',' << r.day << ',' << r.department << ',' <<
r.number << ',' << r.teacher << ',' << r.days << ',' << r.location;
}
};
struct Database {
// Here we will store all records
std::vector<Record> records{};
// This will read a complete CSV database from a stream
friend std::istream& operator >> (std::istream& is, Database& d) {
// Clear/delete potential existing old data
d.records.clear();
// Now read complete database from stream
for (Record r; is >> r; d.records.push_back(r))
;
return is;
}
// And this will write all records to a stream;
friend std::ostream& operator << (std::ostream& os, const Database& d) {
for (const Record& r : d.records)
os << r << '\n';
return os;
}
bool load(const std::string& path) {
// Open file and check, if it could be opened
std::ifstream ifs(path);
if (ifs) {
// Read all data
ifs >> *this;
}
else
// Show error message
std::cerr << "\n\n*** Error: Could not open file '" << path << "'\n\n";
// return status of operation
return not ifs.bad();
}
bool save(const std::string& path) {
// Open file and check, if it could be opened
std::ofstream ofs(path);
if (ofs) {
// Read all data
ofs << *this;
}
else
// Show error message
std::cerr << "\n\n*** Error: Could not open file '" << path << "'\n\n";
// return status of operation
return not ofs.bad();
}
// Please add all functions here to work with your data in the database
// Example: pretty print
void display() {
for (const Record& r : records)
std::cout << r.ID << '\n' << r.no << '\n' << r.name << '\n' << r.day << '\n' << r.department << '\n' <<
r.number << '\n' << r.teacher << '\n' << r.days << '\n' << r.location << '\n';
}
};
const std::string Filename{"database.txt"};
int main() {
Database database{};
database.load(Filename);
database.display();
database.save(Filename);
}

read .txt file with doubles and strings - c++ [duplicate]

I am trying to read a file of the following format
id1 1 2 3
id2 2 4 6
id3 5 6 7
...
using this code
Dataset::Dataset(ifstream &file) {
string token;
int i = 0;
while (!file.eof() && (file >> token)){
// read line tokens one-by-one
string ID = token;
vector<int> coords;
while ((file.peek()!='\n') && (!file.eof()) && (file >> token)) {
coords.push_back(atoi(token.c_str()));
}
points.push_back(new Point(ID, coords));
i++;
}
cout << "Loaded " << i << " points." << endl;
}
But it tells me I have read 0 points. What am I doing wrong?
Edit: I am openning this using input_stream.open(input_file) and file.good() returns true.
Edit #2: actually .good() returns true the first time and then false. What is that all about?
Edit #3: GUYS. IT'S FREAKING WINDOWS. When i put the path as Dataset/test.txt by cin it works and when I do it like Dataset\test.txt by the commandline it doesn't...
Now the problem is it seems not stop at new lines!
Edit #4: Freaking windows again! It was peeking '\r' instead of '\n'.
Here's an idea: overload operator>>:
struct Point
{
int x, y, z;
friend std::istream& operator>>(std::istream& input, Point& p);
};
std::istream& operator>>(std::istream& input, Point& p)
{
input >> p.x;
input >> p.y;
input >> p.z;
input.ignore(10000, '\n'); // eat chars until end of line.
return input;
}
struct Point_With_ID
: public Point
{
std::string id;
friend std::istream& operator>>(std::istream& input, Point_With_ID& p);
};
std::istream& operator>>(std::istream& input, Point_With_ID& p)
{
input >> p.id;
input >> static_cast<Point&>(p); // Read in the parent items.
return input;
}
Your input could look like this:
std::vector<Point_With_ID> database;
Point_With_ID p;
while (file >> p)
{
database.push_back(p);
}
I separated the Point class so that it can be used in other programs or assignments.
I managed to make it work by accounting for both '\r' and '\n' endings and ignoring trailing whitespace like this:
Dataset::Dataset(ifstream &file) {
string token;
int i = 0;
while (file >> token){
// read line tokens one-by-one
string ID = token;
vector<int> coords;
while ((file.peek()!='\n' && file.peek()!='\r') && (file >> token)) { // '\r' for windows, '\n' for unix
coords.push_back(atoi(token.c_str()));
if (file.peek() == '\t' || file.peek() == ' ') { // ignore these
file.ignore(1);
}
}
Point p(ID, coords);
points.emplace_back(p);
i++;
// ignore anything until '\n'
file.ignore(32, '\n');
}
cout << "Loaded " << i << " points." << endl;
}
Probably not the best of the solutions suggested but it's working!
You should not use eof() in a loop condition. See Why is iostream::eof inside a loop condition considered wrong? for details. You can instead use the following program to read into the vector of Point*.
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
class Point
{
public:
std::string ID = 0;
std::vector<int> coords;
Point(std::string id, std::vector<int> coord): ID(id), coords(coord)
{
}
};
int main()
{
std::vector<Point*> points;
std::ifstream file("input.txt");
std::string line;
int var = 0;
while (std::getline(file, line, '\n'))//read line by line
{
int j = 0;
std::istringstream ss(line);
std::string ID;
ss >> ID;
std::vector<int> coords(3);//create vector of size 3 since we already know only 3 elements needed
while (ss >> var) {
coords.at(j) = var;
++j;
}
points.push_back(new Point(ID, coords));
}
std::cout<<points.size()<<std::endl;
//...also don't forget to free the memory using `delete` or use smart pointer instead
return 0;
}
The output of the above program can be seen here.
Note that if you're using new then you must use delete to free the memory that you've allocated. This was not done in the above program that i have given since i only wanted to show how you can read the data in your desired manner.
You've baked everything up in a complex deserializing constructor. This makes the code hard to understand and maintain.
You have a coordinate, so make class for that, we can call it Coord, that is capable of doing its own deserializing.
You have a Point, which consists of an ID and a coordinate, so make a class for that, that is capable of doing its own deserializing.
The Dataset will then just use the deserializing functions of the Point.
Don't limit deserializing to ifstreams. Make it work with any istream.
Deserializing is often done by overloading operator>> and operator<< for the types involved. Here's one way of splitting the problem up in smaller parts that are easier to understand:
struct Coord {
std::vector<int> data;
// read one Coord
friend std::istream& operator>>(std::istream& is, Coord& c) {
if(std::string line; std::getline(is, line)) { // read until end of line
c.data.clear();
std::istringstream iss(line); // put it in an istringstream
// ... and extract the values:
for(int tmp; iss >> tmp;) c.data.push_back(tmp);
}
return is;
}
// write one Coord
friend std::ostream& operator<<(std::ostream& os, const Coord& c) {
if(not c.data.empty()) {
auto it = c.data.begin();
os << *it;
for(++it; it != c.data.end(); ++it) os << ' ' << *it;
}
return os;
}
};
struct Point {
std::string ID;
Coord coord;
// read one Point
friend std::istream& operator>>(std::istream& is, Point& p) {
return is >> p.ID >> p.coord;
}
// write one Point
friend std::ostream& operator<<(std::ostream& os, const Point& p) {
return os << p.ID << ' ' << p.coord;
}
};
struct Dataset {
std::vector<Point> points;
// read one Dataset
friend std::istream& operator>>(std::istream& is, Dataset& ds) {
ds.points.clear();
for(Point tmp; is >> tmp;) ds.points.push_back(std::move(tmp));
if(!ds.points.empty()) is.clear();
return is;
}
// write one Dataset
friend std::ostream& operator<<(std::ostream& os, const Dataset& ds) {
for(auto& p : ds.points) os << p << '\n';
return os;
}
};
If you really want a deserializing constructor in Dataset you just need to add these:
Dataset() = default;
Dataset(std::istream& is) {
if(!(is >> *this))
throw std::runtime_error("Failed reading Dataset");
}
You can then open your file and use operator>> to fill the Dataset and operator<< to print the Dataset on screen - or to another file if you wish.
int main() {
if(std::ifstream file("datafile.dat"); file) {
if(Dataset ds; file >> ds) { // populate the Dataset
std::cout << ds; // print the result to screen
}
}
}
Demo

Store several strings in an array of structures C++

I am doing a project with I/O and structs. My text file is below. I need to make my program store each string in a different part of the array of structs I have created. I am having a problem making it separate them in the array when it senses a blank line.
Steps for the program: 1. Read each line with data and store it in the struct array until it reaches a blank line. 2. Output each string in a different group or on a different line.
Text file:
ecl:gry pid:860033327 hcl:#fffffd
byr:1937 iyr:2017 cid:147 hgt:183cm
iyr:2013 ecl:amb cid:350 pid:028048884
hcl:#cfa07d byr:1929
hcl:#ae17e1 iyr:2013 cid:150
eyr:2024
ecl:brn pid:760753108 byr:1931
hgt:179cm
hcl:#cfa07d eyr:2025 pid:166559648
iyr:2011 ecl:brn hgt:59in cid:230
My code:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
const int SIZE = 4;
struct Passports {
std::string singlePass;
};
int main()
{
Passports records[SIZE];
std::string fileName = "some_file.txt";
std::ifstream inFile(fileName);
std::string line, data;
if (inFile.is_open()){
while (!inFile.eof()) {
getline(inFile, line);
std::istringstream ss(line);
for (int i = 0; i < SIZE; i++) {
while (ss >> records[i].singlePass) {
std::cout << records[i].singlePass;
}
}
}
}
else {
std::cout << "Error opening file! " << std::endl;
}
}
You should model the data using a struct.
struct Record
{
std::string input_line1;
std::string input_line2;
friend std::istream& operator>>(std::istream& input, Record& r);
};
std::istream& operator>>(std::istream& input, Record& r)
{
std::getline(input, r.input_line1);
std::getline(input, r.input_line2);
input.ignore(1000000, '\n'); // ignore the blank line.
return input;
}
Your input code would look like:
std::vector<Record> database;
Record r;
while (inFile >> r)
{
database.push_back(r);
}
By placing the detailed input inside of the struct, you can modify the input method later without having to change the input code in main().
Detailed Parsing
You could add in a field or two to advance your program (no need to add all the fields at this point, then can be added later).
struct Passport
{
std::string ecl;
friend std::istream& operator>>(std::istream& input, Passport& p);
};
std::istream& operator>>(std::istream& input, Passport& p)
{
std::string text_line1;
std::string text_line2;
std::getline(input, text_line1);
std::getline(input, text_line2);
size_t position = text_line1.find("ecl:");
if (position != std::npos)
{
// extract the value for ecl and assign to p.ecl
}
return input;
}
There are many different methods for parsing the string, the above is alluding to one of them.

Understanding reading txt files in c++

I am trying to understand reading different txt file formats in c++
I am currently trying to read a file formatted like this,
val1 val2 val3
val1 val2 val3
val1 val2 val3
When I read the file in and then cout its contents I only get the first line then a random 0 0 at the end.
I want to save each value into its own variable in a struct.
I am doing this like this,
struct Input{
std::string group;
float total_pay;
unsigned int quantity;
Input(std::string const& groupIn, float const& total_payIn, unsigned int const& quantityIn):
group(groupIn),
total_pay(total_payIn),
quantity(quantityIn)
{}
};
int main(){
std::ifstream infile("input.txt");
std::vector<Input> data;
std::string group;
std::string total_pay;
std::string quantity;
std::getline(infile,group);
std::getline(infile,total_pay);
std::getline(infile,quantity);
while(infile) {
data.push_back(Input(group,atof(total_pay.c_str()),atoi(quantity.c_str())));
std::getline(infile,group);
std::getline(infile,total_pay);
std::getline(infile,quantity);
}
//output
for(Input values : data) {
std::cout << values.group << " " << values.total_pay << " " << values.quantity << '\n';
}
return 0;
}
What is the proper way to read this file in the the format I have specified? Do I need to specify to go to the next line after the third value?
Or should this be taking each value and putting them in to the right variable?
std::getline(infile,group);
std::getline(infile,total_pay);
std::getline(infile,quantity);
Your input processing has a number of issues. Your prevalent usage of std::getline in places where it is not needed isn't helping.
In short, per-line validation of input is generally done with a model similar to the following. Note that this requires the class provide a default constructor. We use an input-string-stream to process a single item from each line of input from the input file. If it was certain there was at-most one per line, we could forego the per-line processing, but it is a potential place for errors, so better safe than sorry. The mantra presented here is commonly used for per-line input validation when reading a stream of objects from a formatted input file, one item per line.
The following code defines the structure as you have it with a few extra pieces, including providing both an input and output stream insertion operator. The result makes the code in main() much more manageable.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
struct Input
{
// friends not needed if the members are public, but provided here
// in case you ever do make them protected or private (which you should)
friend std::istream& operator >>(std::istream& inp, Input& item);
friend std::ostream& operator <<(std::ostream& outp, Input const& item);
std::string group;
float total_pay;
unsigned int quantity;
// default constructor. sets up zero-elements
Input() : total_pay(), quantity()
{
}
Input(std::string groupIn, float total_payIn, unsigned int quantityIn)
: group(std::move(groupIn))
, total_pay(total_payIn)
, quantity(quantityIn)
{
}
// you really should be using these for accessors
std::string const& getGroup() const { return group; }
float getTotalPay() const { return total_pay; }
unsigned int getQuantity() const { return quantity; }
};
// global free function for extracting an Input item from an input stream
std::istream& operator >>(std::istream& inp, Input& item)
{
return (inp >> item.group >> item.total_pay >> item.quantity);
}
// global operator for inserting to a stream
std::ostream& operator <<(std::ostream& outp, Input const& item)
{
outp << item.getGroup() << ' '
<< item.getTotalPay() << ' '
<< item.getQuantity();
return outp;
}
int main()
{
std::ifstream infile("input.txt");
if (!infile)
{
std::cerr << "Failed to open input file" << '\n';
exit(EXIT_FAILURE);
}
// one line per item enforced.
std::vector<Input> data;
std::string line;
while (std::getline(infile, line))
{
std::istringstream iss(line);
Input inp;
if (iss >> inp) // calls our extaction operator >>
data.emplace_back(inp);
else
std::cerr << "Invalid input line: " << line << '\n';
}
// dump all of them to stdout. calls our insertion operator <<
std::copy(data.begin(), data.end(),
std::ostream_iterator<Input>(std::cout,"\n"));
return 0;
}
Provided the input is properly formatted, values like this:
group total quantity
group total quantity
will parse successfully. Conversely, if this happens:
group total quantity
group quantity
group total quantity
total quantity
the extractions of the second and fourth items will fail, and appropriate warning will be issued on std::cerr. This is the reason for using the std::istringstream intermediate stream object wrapping extraction of a single line per item.
Best of luck, and I hope it helps you out.
Check this solution
It is without error checks but with conversion to types
#include<iostream>
#include<sstream>
using namespace std;
int main()
{
string line="v1 2.2 3";//lets say you read a line to this var...
string group;
float total_pay;
unsigned int quantity;
//we split the line to the 3 fields
istringstream s(line);
s>>group>>total_pay>>quantity;
//print for test
cout<<group<<endl<<total_pay<<endl<<quantity<<endl;
return 0;
}

C++ Read file line by line then split each line using the delimiter

I want to read a txt file line by line and after reading each line, I want to split the line according to the tab "\t" and add each part to an element in a struct.
my struct is 1*char and 2*int
struct myStruct
{
char chr;
int v1;
int v2;
}
where chr can contain more than one character.
A line should be something like:
randomstring TAB number TAB number NL
Try:
Note: if chr can contain more than 1 character then use a string to represent it.
std::ifstream file("plop");
std::string line;
while(std::getline(file, line))
{
std::stringstream linestream(line);
std::string data;
int val1;
int val2;
// If you have truly tab delimited data use getline() with third parameter.
// If your data is just white space separated data
// then the operator >> will do (it reads a space separated word into a string).
std::getline(linestream, data, '\t'); // read up-to the first tab (discard tab).
// Read the integers using the operator >>
linestream >> val1 >> val2;
}
Unless you intend to use this struct for C as well, I would replace the intended char* with std::string.
Next, as I intend to be able to read it from a stream I would write the following function:
std::istream & operator>>( std::istream & is, myStruct & my )
{
if( std::getline(is, my.str, '\t') )
return is >> my.v1 >> my.v2;
}
with str as the std::string member. This writes into your struct, using tab as the first delimiter and then any white-space delimiter will do before the next two integers. (You can force it to use tab).
To read line by line you can either continue reading these, or read the line first into a string then put the string into an istringstream and call the above.
You will need to decide how to handle failed reads. Any failed read above would leave the stream in a failed state.
std::ifstream in("fname");
while(in){
std::string line;
std::getline(in,line);
size_t lasttab=line.find_last_of('\t');
size_t firsttab=line.find_last_of('\t',lasttab-1);
mystruct data;
data.chr=line.substr(0,firsttab).c_str();
data.v1=atoi(line.substr(firsttab,lasttab).c_str());
data.v2=atoi(line.substr(lasttab).c_str());
}
I had some difficulty following some of the suggestions here, so I'm posting a complete example of overloading both input and output operators for a struct over a tab-delimited file. As a bonus, it also takes the input either from stdin or from a file supplied via the command arguments.
I believe this is about as simple as it gets while adhering to the semantics of the operators.
pairwise.h
#ifndef PAIRWISE_VALUE
#define PAIRWISE_VALUE
#include <string>
#include <iostream>
struct PairwiseValue
{
std::string labelA;
std::string labelB;
float value;
};
std::ostream& operator<<(std::ostream& os, const PairwiseValue& p);
std::istream& operator>>(std::istream& is, PairwiseValue& p);
#endif
pairwise.cc
#include "pairwise.h"
std::ostream& operator<<(std::ostream& os, const PairwiseValue& p)
{
os << p.labelA << '\t' << p.labelB << '\t' << p.value << std::endl;
return os;
}
std::istream& operator>>(std::istream& is, PairwiseValue& p)
{
PairwiseValue pv;
if ((is >> pv.labelA >> pv.labelB >> pv.value))
{
p = pv;
}
return is;
}
test.cc
#include <fstream>
#include "pairwise.h"
int main(const int argc, const char* argv[])
{
std::ios_base::sync_with_stdio(false); // disable synch with stdio (enables input buffering)
std::string ifilename;
if (argc == 2)
{
ifilename = argv[1];
}
const bool use_stdin = ifilename.empty();
std::ifstream ifs;
if (!use_stdin)
{
ifs.open(ifilename);
if (!ifs)
{
std::cerr << "Error opening input file: " << ifilename << std::endl;
return 1;
}
}
std::istream& is = ifs.is_open() ? static_cast<std::istream&>(ifs) : std::cin;
PairwiseValue pv;
while (is >> pv)
{
std::cout << pv;
}
return 0;
}
Compiling
g++ -c pairwise.cc test.cc
g++ -o test pairwise.o test.o
Usage
./test myvector.tsv
cat myvector.tsv | ./test