This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 8 years ago.
Pretty self-explanatory, I tried google and got a lot of the dreaded expertsexchange, I searched here as well to no avail. An online tutorial or example would be best. Thanks guys.
More information would be useful.
But the simplest form:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
You can try the Boost Tokenizer library, in particular the Escaped List Separator
If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:
struct Customer {
int id;
std::string first_name;
std::string last_name;
struct {
std::string street;
std::string unit;
} address;
char state[2];
int zip;
};
Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer> or std::set<Customer>.
With that in mind, think of your CSV handling as two operations:
// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
CSVReader(const std::string &inputFile);
bool hasNextLine();
void readNextLine(std::vector<std::string> &fields);
private:
/* secrets */
};
class CSVWriter {
public:
CSVWriter(const std::string &outputFile);
void writeNextLine(const std::vector<std::string> &fields);
private:
/* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);
Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:
Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
Loading the CSV file
Representing the file in memory so that you can modify it and read it
Saving the CSV file back to disk
So you are looking at writing a CSVDocument class that contains:
Load(const char* file);
Save(const char* file);
GetBody
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
Using boost tokenizer to parse records, see here for more details.
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
/// do something with the record
if (vec.size() < 3) continue;
copy(vec.begin(), vec.end(),
ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
I found this interesting approach:
CSV to C structure utility
Quote:
CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.
Related
I am trying to write up a config parser class in c++.
I'll first give a snippet of my class:
class foo{
private:
struct st{
std::vector<pair<std::string,std::string>> dvec;
std::string dname;
}
std::vector<st> svec;
public:
//methods of the class
}
int main(){
foo f;
//populate foo
}
I will populate the vectors with data from a file. The file has some text with delimiters. I'll break up the text into strings using the delimiters. I know the exact size of the file and since I am keeping all data as character string, it's safe to assume the vector svec will take the same memory space as the file size. However, I don't know how many strings there will be. e.g., I know the file size is 100 bytes but I don't know if it's 1 string of 100 characters or 10 strings of 10 characters each or whatever.
I would like to avoid reallocation as much as possible. But std::vector.reserve() and std::vector.resize() both allocate memory in terms of number of elements. And this is my problem, I don't know how many elements there will be. I just know how many bytes it will need. I dug around a bit but couldn't find anything.
I am guessing I will be cursed if I try this- std::vector<st> *svec = (std::vector<st> *) malloc(filesize);
Is there any way to reserve memory for vector in terms of bytes instead of number of elements? Or some other workaround?
Thank you for your time.
Edit: I have already written the entire code and it's working. I am just looking for ways to optimize it. The entire code is too long so I will give the repository link for anyone interested- https://github.com/Rakib1503052/Config_parser
For the relevant part of the code:
class Config {
private:
struct section {
std::string sec_name;
std::vector<std::pair<std::string, std::string>> sec_data;
section(std::string name)
{
this->sec_name = name;
//this->sec_data = data;
}
};
//std::string path;
std::vector<section> m_config;
std::map<std::string, size_t> section_map;
public:
void parse_config(std::string);
//other methods
};
void Config::parse_config(string path)
{
struct stat buffer;
bool file_exists = (stat(path.c_str(), &buffer) == 0);
if (!file_exists) { throw exception("File does not exist in the given path."); }
else {
ifstream FILE;
FILE.open(path);
if (!FILE.is_open()) { throw exception("File could not be opened."); }
string line_buffer, key, value;;
char ignore_char;
size_t current_pos = 0;
//getline(FILE, line_buffer);
while (getline(FILE, line_buffer))
{
if (line_buffer == "") { continue; }
else{
if ((line_buffer.front() == '[') && (line_buffer.back() == ']'))
{
line_buffer.erase(0, 1);
line_buffer.erase(line_buffer.size() - 1);
this->m_config.push_back(section(line_buffer));
current_pos = m_config.size() - 1;
section_map[line_buffer] = current_pos;
}
else
{
stringstream buffer_stream(line_buffer);
buffer_stream >> key >> ignore_char >> value;
this->m_config[current_pos].sec_data.push_back(field(key, value));
}
}
}
FILE.close();
}
}
It reads an INI file of the format
[section1]
key1 = value1
key2 = value2
[section2]
key1 = value1
key2 = value2
.
.
.
However, after some more digging I found out that std::string works differently than I thought. Simply put, the strings themselves are not in the vector. The vector only holds pointers to the strings. This makes my case moot.
I'll keep the question here for anyone interested. Especially, if the data is changed to unary types like int or double, the question stands and it has a great answer here- https://stackoverflow.com/a/18991810/11673912
Feel free to share other opinions.
I've been working on a *.csv parser and it works for the most part. Basically the parse works like this:
Open a stream for the designated *.csv file (filepath, number of data columns, and a delimiter must be supplied at constructor).
Get each line until the end of file. For each raw csv line extracted:
i) Stream that line into a line_stream (using std::istringstream).
ii) Break that stream into smaller string tokens (using the delimiter supplied at constructor) and push into a vector of strings.
Extracted data strings can be accessed through a private interface database interface until destruction of the parser object (or until I implement the string processor to convert them into workable data).
The parser object works, except that the very first character of the first string element of the vector always gets dropped. For example:
0544,1,Kitchenware,2,27
gets turned into:
544,1,Kitchenware,2,27
This is an unacceptable loss of information but I can't figure out the reason for this problem. I have worked around by pushing a dummy string before each packet iterator,
,ummy,0544,Kitchenware,27
but I still feel that this a very poor implementation.
I suspect it's a problem with the line_stream but not entirely sure how.
Below is the source code:
#ifndef _CSVPARSER_HPP
#define _CSVPARSER_HPP
#include <fstream>
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
const char DEFAULT_COLUMNS_SEPARATOR = ',';
class csvStream {
private:
void _setOpen() { inStream.open(fpath, ios::in); }
protected:
string fpath;
fstream inStream;
public:
explicit csvStream(const string& path) {
fpath = path;
_setOpen();
}
};
template <typename T>
class csvParser : protected csvStream {
private:
int numDataTypes;
char col_separator;
string extractedRawLine;
// Database of extracted packets of data as strings.
vector<vector<string>*> database;
// Return a reference to <data as a packet of strings>.
vector<string>& _rawToStringPacket() {
// line stream buffer & token
istringstream line_stream;
static string token;
// new packet on the heap
vector<string>* packet = nullptr;
packet = new vector<string>;
// extract strings into packet
line_stream.str(extractedRawLine);
packet->push_back("dummy");
for (int i = 1; i < numDataTypes + 1; ++i) {
getline(line_stream, token, col_separator);
packet->push_back(token);
}
return *packet;
}
/****************************PUBLIC-API****************************/
public:
// Explicit Constructor: Must supply path to a *.csv file and the number of
// data columns.
explicit csvParser(const string& path,
const int ntypes,
const char sptr = ',')
: csvStream(path), numDataTypes{ntypes}, col_separator{sptr} {}
// Extract all data while stream is open.
void extractAllRaw() {
// continue to extract data until the end of file:
while (getline(*inStream, extractedRawLine)) {
static vector<string>* temp = nullptr;
temp = &_rawToStringPacket();
database.push_back(temp);
}
}
// Stream the data elements to standard output.
void printDatabase(ostream& os = cout) {
for (auto i : database) {
for (int j = 0; j < static_cast<int>(i->size()); ++j) {
os << (*i)[j] << ",";
}
os << '\n';
}
}
};
#endif
I've a task to copy elements from .txt file[direct access file] to .bin file[fixed length record file] (homework).
.txt file holds strings. Every line has one word.
I came up with code below, but I'm not sure if that's what is needed and even slighly correct. Any help will be useful! (I'm new to C++)
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
const int buffer_size = 30;
class Word{
char name[buffer_size];
public:
void setName () // Trying to get every word from a line
{
string STRING;
ifstream infile;
infile.open ("text.txt");
while(!infile.eof()) // To get you all the lines.
{
getline(infile,STRING); // Saves the line in STRING.
}
infile.close();
}
};
void write_record()
{
ofstream outFile;
outFile.open("binFILE.bin", ios::binary | ios::app);
Word obj;
obj.setName();
outFile.write((char*)&obj, sizeof(obj));
outFile.close();
}
int main()
{
write_record();
return 0;
}
NEW APPROACH:
class Word
{
char name[buffer_size];
public:
Word(string = "");
void setName ( string );
string getName() const;
};
void readWriteToFile(){
// Read .txt file content and write into .bin file
string temp;
Word object;
ofstream outFile("out.dat", ios::binary);
fstream fin ("text.txt", ios::in);
getline(fin, temp);
while(fin)
{
object.setName(temp);
outFile.write( reinterpret_cast< const char* >( &object ),sizeof(Word) );
getline(fin, temp);
}
fin.close();
outFile.close();
}
int main()
{
readWriteToFile();
return 0;
}
Word::Word(string nameValue)
{
setName(nameValue);
}
void Word::setName( string nameString )
{
// Max 30 char copy
const char *nameValue = nameString.data();
int len = strlen(nameValue);
len = ( len < 31 ? len : 30);
strncpy(name, nameValue, len);
name[len] = '\0';
}
string Word::getName() const
{
return name;
}
Quick commentary and walk through
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
Avoid using namespace std; while you are learning. It can lead to some really nasty, hard to pin-down bugs as your functions may be silently replaced by functions with the same name in the standard library.
const int buffer_size = 30;
class Word
{
char name[buffer_size];
Since it looks like you are allowed to use std::string why not use it here?
public:
void setName() // Trying to get every word from a line
Really bad name for a function that apparently is supposed to // Trying to get every word from a line
{
string STRING;
ifstream infile;
infile.open("text.txt");
while (!infile.eof()) // To get you all the lines.
{
getline(infile, STRING); // Saves the line in STRING.
}
Few things wrong here. One is the epic Why is iostream::eof inside a loop condition considered wrong?
Next is while the code reads each line, it doesn't do anything with the line. STRING is never stored anywhere.
Finally in a class that sounds as though it should contain and manage a single word, it reads all the words in the file. There may be a case for turning this function into a static factory that churns out a std::vector of Words.
infile.close();
}
};
void write_record()
{
ofstream outFile;
outFile.open("binFILE.bin", ios::binary | ios::app);
ios::app will add onto an existing file. This doesn't sound like what was described in the assignment description.
Word obj;
obj.setName();
We've already coverred the failings of the Word class.
outFile.write((char*) &obj, sizeof(obj));
Squirting an object into a stream without defining a data protocol or using any serialization is dangerous. It makes the file non-portable. You will find that some classes, vector and string prominent among these, do not contain their data. Writing a string to a file may get you nothing more than a count and an address that is almost certainly not valid when the file is loaded.
In this case all the object contains is an array of characters and that should write to file cleanly, but it will always write exactly 30 bytes and that may not be what you want.
outFile.close();
}
int main()
{
write_record();
return 0;
}
Since this is homework I'm not writing this sucker for you, but here are a few suggestions:
Read file line by line will get you started on the file reader. Your case is simpler because there is only one word on each line. Your teacher may throw a curveball and add more stuff onto a line, so you may want to test for that.
Read the words from the file into a std::vector. vector will make your job so easy that you might have time for other homework.
A very simplistic implementation is:
std::vector<std::string> words;
while (getline(infile, STRING)) // To get you all the lines.
{
words.push_back(STRING);
}
For writing the file back out in binary, I suggest going Pascal style. First write the length of the string in binary. Use a known, fixed width unsigned integer (no such thing as a negative string) and watch out for endian. Once the length is written, write only the number of characters you need to write.
Ignoring endian, you should have something like this:
uint32_t length = word.length(); // length will always be 32 bits
out.write((char*)&length, sizeof(length));
out.write(word.c_str(), length);
When you are done writing the writer, write a reader function so that you can test that the writer works correctly. Always test your code, and I recommend not writing anything until you know how you'll test it. Very often coming at a program from the test side first will find problems before they even have a chance to start.
This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 8 years ago.
Pretty self-explanatory, I tried google and got a lot of the dreaded expertsexchange, I searched here as well to no avail. An online tutorial or example would be best. Thanks guys.
More information would be useful.
But the simplest form:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
You can try the Boost Tokenizer library, in particular the Escaped List Separator
If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:
struct Customer {
int id;
std::string first_name;
std::string last_name;
struct {
std::string street;
std::string unit;
} address;
char state[2];
int zip;
};
Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer> or std::set<Customer>.
With that in mind, think of your CSV handling as two operations:
// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
CSVReader(const std::string &inputFile);
bool hasNextLine();
void readNextLine(std::vector<std::string> &fields);
private:
/* secrets */
};
class CSVWriter {
public:
CSVWriter(const std::string &outputFile);
void writeNextLine(const std::vector<std::string> &fields);
private:
/* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);
Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:
Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
Loading the CSV file
Representing the file in memory so that you can modify it and read it
Saving the CSV file back to disk
So you are looking at writing a CSVDocument class that contains:
Load(const char* file);
Save(const char* file);
GetBody
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
Using boost tokenizer to parse records, see here for more details.
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
/// do something with the record
if (vec.size() < 3) continue;
copy(vec.begin(), vec.end(),
ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
I found this interesting approach:
CSV to C structure utility
Quote:
CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.
I have a list of cities that I'm formatting like this:
{town, ...},
{...},
...
Reading and building each town and creating town1, town2,.... works
The problem is when I output it, 1st line works {town, ...}, but the second line crashes.
Any idea why?
I have [region] [town] (excel table).
So each region repeats by how many towns are in it.
Each file has 1 region/town per line.
judete contains each region repeated 1 time.
AB
SD
PC
....
orase contains the towns list.
town1
town2
....
orase-index contains the region of each town
AB
AB
AB
AB
SD
SD
SD
PC
PC
...
I want an output like this {"town1", "town2", ...} and each row (row 5) contains the town that belong to the region from judete at the same row (row 5).
Here's my code:
#include<stdio.h>
#include<string.h>
char judet[100][100];
char orase[50][900000];
char oras[100], ceva[100];
void main ()
{
int i=0, nr;
FILE *judete, *index, *ORASE, *output;
judete = fopen("judete.txt", "rt");
index = fopen("orase-index.txt", "rt");
ORASE = fopen("orase.txt", "rt");
output = fopen("output.txt", "wt");
while( !feof(judete) )
{
fgets(judet[i], 100, judete);
i++;
}
nr = i;
char tmp[100];
int where=0;
for(i=0;i<nr;i++)
strcpy(orase[i],"");
while( !feof(index) )
{
fgets(tmp, 100, index);
for(i=0;i<nr;i++)
{
if( strstr(judet[i], tmp) )
{
fgets(oras, 100, ORASE);
strcat(ceva, "\"");
oras[strlen(oras)-1]='\0';
strcat(ceva, oras);
strcat(ceva, "\", ");
strcat(orase[i], ceva);
break;
}
}
}
char out[900000];
for(i=0;i<nr;i++)
{
strcpy(out, "");
strcat(out, "{");
strcat(out, orase[i]); //fails here
fprintf(output, "%s},\n", out);
}
}
The result I get from running the code is:
Unhandled exception at 0x00D4F7A9 (msvcr110d.dll) in orase-judete.exe: 0xC0000005: Access violation writing location 0x00A90000.
You don't clear orase array, beacause your loop
for(i-0;i<nr;i++)
strcpy(orase[i],"");
by mistake ('-' instead of '=') executes 0 times.
I think you need to start by making up your mind whether you're writing C or C++. You've tagged this with both, but the code looks like it's pure C. While a C++ compiler will accept most C, the result isn't what most would think of as ideal C++.
Since you have tagged it as C++, I'm going to assume you actually want (or all right with) C++ code. Well written C++ code is going to be enough different from your current C code that it's probably easier to start over than try to rewrite the code line by line or anything like that.
The immediate problem I see with doing that, however, is that you haven't really specified what you want as your output. For the moment I'm going to assume you want each line of output to be something like this: "{" <town> "," <town> "}".
If that's the case, I'd start by noting that the output doesn't seem to depend on your judete file at all. The orase and orase-index seem to be entirely adequate. For that, our code can look something like this:
#include <iostream>
#include <string>
#include <iterator>
#include <fstream>
#include <vector>
// a class that overloads `operator>>` to read a line at a time:
class line {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, line &l) {
return std::getline(is, l.data);
}
operator std::string() const { return data; }
};
int main() {
// open the input files:
std::ifstream town_input("orase.txt");
std::ifstream region_input("orase-index.txt");
// create istream_iterator's to read from the input files. Note
// that these iterate over `line`s, (i.e., objects of the type
// above, so they use its `operator>>` to read each data item).
//
std::istream_iterator<line> regions(region_input),
towns(town_input),
end;
// read in the lists of towns and regions:
std::vector<std::string> town_list {towns, end};
std::vector<std::string> region_list {regions, end};
// write out the file of town-name, region-name:
std::ofstream result("output.txt");
for (int i=0; i<town_list.size(); i++)
result << "{" << town_list[i] << "," << region_list[i] << "}\n";
}
Noe that since this is C++, you'll typically need to save the source as something.cpp instead of something.c for the compiler to recognize it correctly.
Edit: Based on the new requirements you've given in the comments, you apparently want something closer to this:
#include <iostream>
#include <string>
#include <iterator>
#include <fstream>
#include <vector>
#include <map>
// a class that overloads `operator>>` to read a line at a time:
class line {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, line &l) {
return std::getline(is, l.data);
}
operator std::string() const { return data; }
};
int main() {
// open the input files:
std::ifstream town_input("orase.txt");
std::ifstream region_input("orase-index.txt");
// create istream_iterator's to read from the input files. Note
// that these iterate over `line`s, (i.e., objects of the type
// above, so they use its `operator>>` to read each data item).
//
std::istream_iterator<line> regions(region_input),
towns(town_input),
end;
// read in the lists of towns and regions:
std::vector<std::string> town_list (towns, end);
std::vector<std::string> region_list (regions, end);
// consolidate towns per region:
std::map<std::string, std::vector<std::string> > consolidated;
for (int i = 0; i < town_list.size(); i++)
consolidated[region_list[i]].push_back(town_list[i]);
// write out towns by region
std::ofstream output("output.txt");
for (auto pos = consolidated.begin(); pos != consolidated.end(); ++pos) {
std::cout << pos->first << ": ";
std::copy(pos->second.begin(), pos->second.end(),
std::ostream_iterator<std::string>(output, "\t"));
std::cout << "\n";
}
}
Notice that ceva is never initialized.
Instead of using strcpy to initialize strings, I would recommend using static initialization:
char ceva[100]="";