reading selected lines of file using mmap - c++

I posted one question, which was related to faster reading of a file, by skipping specific lines but that does not seem to go well with standard c++ api's.
I researched more and got to know what memory mapped files could come handy for these kinds of cases. Details about memory mapped files are here.
All in all,
Suppose, the file(file.txt) is like this:
A quick brown fox
// Blah blah
// Blah blah
jumps over the little lazy dog
And then in code, opened file, Read that as memory mapped file and then iterate over the contents of the char* pointer, skipping the file pointers itself. Wanted to give it a try before reaching to an conclusion on it. Skeleton of my code looks like this:
struct stat filestat;
FILE *file = fopen("file.txt", "r");
if (-1 == fstat(fileno(file), &filestat)) {
std::cout << "FAILED with fstat" << std::endl;
return FALSE;
} else {
char* data = (char*)mmap(0, filestat.st_size, PROT_READ, MAP_PRIVATE, fileno(file), 0);
if (data == 0) {
std::cout << "FAILED " << std::endl;
return FALSE;
}
// Filter out 'data'
// for (unsigned int i = 0; i < filestat.st_size; ++i) {
// Do something here..
// }
munmap(data, filestat.st_size);
return TRUE;
}
In this case, I would want to capture lines which does not start with //. Since this file(file.txt) is already memory mapped, I could go over the data pointer and filter out the lines. Am I correct in doing so?
If so, what is the efficient way to parse the lines?

Reading selected lines from wherever and copy them to whatever can be done with the C++ algorithms.
You can use std::copy_if. This will copy data from any source to any destination, if the predicate is true.
I show you a simple example that copies data from a file and skips all lines starting with "//". The result will be put in a vector.
This is one statement with calling one function. So, a classical one liner.
For debugging purposes, I print the result to the console.
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <string>
#include <fstream>
using LineBasedTextFile = std::vector<std::string>;
class CompleteLine { // Proxy for the input Iterator
public:
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, CompleteLine& cl) { std::getline(is, cl.completeLine); return is; }
// Cast the type 'CompleteLine' to std::string
operator std::string() const { return completeLine; }
protected:
// Temporary to hold the read string
std::string completeLine{};
};
int main()
{
// Open the input file
std::ifstream inputFile("r:\\input.txt");
if (inputFile)
{
// This vector will hold all lines of the file
LineBasedTextFile lineBasedTextFile{};
// Read the file and copy all lines that fullfill the required condition, into the vector of lines
std::copy_if(std::istream_iterator<CompleteLine>(inputFile), std::istream_iterator<CompleteLine>(), std::back_inserter(lineBasedTextFile), [](const std::string & s) {return s.find("//") != 0; });
// Print vector of lines
std::copy(lineBasedTextFile.begin(), lineBasedTextFile.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
return 0;
}
I hope this helps

Related

How to read file or stream until string found

I am writing a dictionary program, the input is specified by a file and parsed as such:
std::string savedDictionary(std::istreambuf_iterator<char>(std::ifstream(DICTIONARY_SAVE_FILE)), {});
// entire file loaded into savedDictionary
for (size_t end = 0; ;)
{
size_t term = savedDictionary.find("|TERM|", end);
size_t definition = savedDictionary.find("|DEFINITION|", term);
if ((end = savedDictionary.find("|END|", definition)) == std::string::npos) break;
// store term and definition here...
}
This throws std::bad_alloc on some of my third world users' machines that don't have enough RAM to store the dictionary string + the dictionary as it's held inside my program.
If I could do this:
std::string term;
for (std::ifstream file(DICTIONARY_SAVE_FILE); file; std::getline(file, term, "|END|")
{
// same as above
}
then it would be great, but std::getline doesn't support a string as delimiter.
So, what's the most idiomatic way to read the file until I find "|END|" without allocating a crap ton of memory up front?
We can achieve the requested functionality by using a very simple proxy class. With that it is easy to use all std::algorithms and all std::iterators as usual.
So, we define a smal proxy class called LineUntilEnd. This can be used in conjunction with all streams like a std::ifstream or whatever you like. You can especially simply use the extractor operator to extract a value from the input stream and put it into the desired variable.
// Here we will store the lines until |END|
LineUntilEnd lue;
// Simply read the line until |END|
while (testInput >> lue) {
It works as expected.
An if we have such a string, we can parse it afterwords with easy regex operations.
I added a small example and put the resulting value in a std::multimap to build a demo dictionary.
Please see the following code
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <map>
#include <sstream>
#include <iterator>
// Ultra simple proxy class to read data until given word is found
struct LineUntilEnd
{
// Overload the extractor operator
friend std::istream& operator >>(std::istream& is, LineUntilEnd& lue);
// Intermediate storage for result
std::string data{};
};
// Read stream until "|END|" symbol has been found
std::istream& operator >>(std::istream& is, LineUntilEnd& lue)
{
// Clear destination string
lue.data.clear();
// We will count, how many bytes of the search string have been matched
size_t matchCounter{ 0U };
// Read characters from stream
char c{'\0'};
while (is.get(c))
{
// Add character to resulting string
lue.data += c;
// CHeck for a match. All characters must be matched
if (c == "|END|"[matchCounter]) {
// Check next matching character
++matchCounter;
// If there is a match for all characters in the searchstring
if (matchCounter >= (sizeof "|END|" -1)) {
// The stop reading
break;
}
}
else {
// Not all charcters could be matched. Start from the begining
matchCounter = 0U;
}
}
return is;
}
// Input Test Data
std::istringstream testInput{ "|TERM|bonjour|TERM|hola|TERM|hi|DEFINITION|hello|END||TERM|Adios|TERM|Ciao|DEFINITION|bye|END|" };
// Regex defintions. Used to build up a dictionary
std::regex reTerm(R"(\|TERM\|(\w+))");
std::regex reDefinition(R"(\|DEFINITION\|(\w+)\|END\|)");
// Test code
int main()
{
// We will store the found values in a dictionay
std::multimap<std::string, std::string> dictionary{};
// Here we will store the lines until |END|
LineUntilEnd lue;
// Simply read the line until |END|
while (testInput >> lue) {
// Search for the defintion string
std::smatch sm{};
if (std::regex_search(lue.data, sm, reDefinition)) {
// Definition string found
// Iterate over all terms
std::sregex_token_iterator tokenIter(lue.data.begin(), lue.data.end(), reTerm, 1);
while (tokenIter != std::sregex_token_iterator()) {
// STore values in dictionary
dictionary.insert({ sm[1],*tokenIter++ });
}
}
}
// And show some result to the user
for (const auto& d : dictionary) {
std::cout << d.first << " --> " << d.second << "\n";
}
return 0;
}
For those in the future, this is what I ended up writing:
std::optional<std::string> ReadEntry(std::istream& stream)
{
for (struct { char ch; int matched; std::string entry; } i{}; stream.get(i.ch); i.entry.push_back(i.ch))
if (i.ch == "|END|"[i.matched++]);
else if (i.matched == sizeof("|END|")) return i.entry;
else i.matched = 0;
return {};
}

How to put characters from file into two-dimensional vector?

I've been trying to read in characters from an external file to be put into a two-dimensional vector with type char. The elements must be able to be compared to certain values in order to navigate a maze given in "MazeSample.txt".
While I haven't been able to get characters be put into the vector, I was able to read and output the characters with the get and cout functions.
The following code is an attempt to read the vectors in the correct format, but provides an error in the end:
//MazeSample.txt
SWWOW
OOOOW
WWWOW
WEOOW
//source.cpp
vector<vector<char>> maze;
ifstream mazeFile;
char token;
mazeFile.open("MazeSample.txt");
while (!mazeFile.eof()) {
mazeFile.get(token); //reads a single character, goes to next char after loop
for (int row = 0; row < maze.size(); row++) {
for (int column = 0; column < maze.at(row).size(); row++) {
maze.push_back(token);
}
}
//cout << token;
}
mazeFile.close();
For the maze provided in "MazeSample.txt", I'd expect the maze vector to read each character row by row, mimicking the format of the maze sample.
In the above code, am provided with an error at maze.push_back(token):
"no instance of overloaded function "std::vector<_Ty, _Alloc>::push_back..." matches the argument list"
"argument types are: (char)"
"object type is: std::vector>, std::allocator>>>"
You are inserting char to vector<vector<char>>. You should create a vector<char>, insert the values of type char into that and then insert the vector<char> to vector<vector<char>> maze;. Here is the corrected version of your program. It can be written in simple ways but for your understanding, I have made made corrections on top of your program.
vector<vector<char>> maze;
ifstream mazeFile;
string token;
mazeFile.open("MazeSample.txt");
while (!mazeFile.eof()) {
std::getline(mazeFile, token); //reads an entire line
//Copy characters in entire row to vector of char
vector<char> vecRow;
vecRow.assign(token.begin(), token.end());
//Push entire row of characters in a vector
maze.push_back(vecRow);
}
mazeFile.close();
The reason for your problem is that you try to put a char into a std::vector of std vector. So you put a wrong type.
maze.at(row).push_back(token); would do it, but then no row exists. You also need to push_back and empty row, before you can write data to it.
That is your syntax error.
Then, your code could be drastically shortened by using C++ algorithms. See:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <sstream>
std::istringstream testDataFile(
R"#(00000
11111
22222
33333
44444
)#");
// This is a proxy to read a complete line with the extractor operator
struct CompleteLineAsVectorOfChar {
// Overloaded Extractor Operator
friend std::istream& operator>>(std::istream& is, CompleteLineAsVectorOfChar& cl) {
std::string s{}; cl.completeLine.clear(); std::getline(is, s);
std::copy(s.begin(), s.end(), std::back_inserter(cl.completeLine));
return is; }
operator std::vector<char>() const { return completeLine; } // Type cast operator for expected value
std::vector<char> completeLine{};
};
int main()
{
// Read complete source file into maze, by simply defining the variable and using the range constructor
std::vector<std::vector<char>> maze { std::istream_iterator<CompleteLineAsVectorOfChar>(testDataFile), std::istream_iterator<CompleteLineAsVectorOfChar>() };
// Debug output: Copy all data to std::cout
std::for_each(maze.begin(), maze.end(), [](const std::vector<char> & l) {std::copy(l.begin(), l.end(), std::ostream_iterator<char>(std::cout, " ")); std::cout << '\n'; });
return 0;
}
But this is not the end. A std::vector<char>has no advantage over a string. You can work nearly have all the same functionality as a std::vector<char>. That is an improvement in design. The code would then look more like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <sstream>
std::istringstream testDataFile(
R"#(00000
11111
22222
33333
44444
)#");
int main()
{
// Read complete source file into maze, by simply defining the variable and using the range constructor
std::vector<std::string> maze{ std::istream_iterator<std::string>(testDataFile), std::istream_iterator<std::string>() };
// Debug output: Copy all data to std::cout
std::copy(maze.begin(), maze.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
This is the by far more simpler solution. And it will serve your needs as well.
Please note: I used a istringstream for reading data, becuase I do not have a file on SO. But it is of cause the same as using any other stream (like an ifstream).
EDIT
The first solution read the source and put it directly into a std::vector<std::vector<char>>:
The 2nd solution put everything in the a std::vector<std::vector<std::string>> which is the most efficient solution. Also a std::string is nearly a std::vector<std::vector<char>>.
The OP requested a 3rd solution where we use the 2nd solution and then copy the std::vector<std::vector<std::string>> into a std::vector<std::vector<char>>.
Please see below
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <sstream>
std::istringstream testDataFile(
R"#(00000
11111
22222
33333
44444
)#");
int main()
{
// Read complete source file into maze, by simply defining the variable and using the range constructor
std::vector<std::string> maze{ std::istream_iterator<std::string>(testDataFile), std::istream_iterator<std::string>() };
// Debug output: Copy all data to std::cout
std::copy(maze.begin(), maze.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
// Edit: Copy into a std::vector<std::vector<char> -------------------------------------------------------
std::cout << "\n\n\nSolution 3:\n\n";
// Define the new variable with number of lines from the first maze
std::vector<std::vector<char>> mazeChar(maze.size());
// Copy the data from the original maze
std::transform(
maze.begin(), // Source
maze.end(),
mazeChar.begin(), // Destination
[](const std::string & s) {
std::vector<char>vc; // Copy columns
std::copy(s.begin(), s.end(), std::back_inserter(vc));
return vc;
}
);
// Debug Output
std::for_each(
mazeChar.begin(),
mazeChar.end(),
[](const std::vector<char> & vc) {
std::copy(vc.begin(), vc.end(), std::ostream_iterator<char>(std::cout));
std::cout << '\n';
}
);
return 0;
}
Hope this helps . . .

Reading In .csv File With Different Data Types [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 8 years ago.
Pretty self-explanatory, I tried google and got a lot of the dreaded expertsexchange, I searched here as well to no avail. An online tutorial or example would be best. Thanks guys.
More information would be useful.
But the simplest form:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
You can try the Boost Tokenizer library, in particular the Escaped List Separator
If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:
struct Customer {
int id;
std::string first_name;
std::string last_name;
struct {
std::string street;
std::string unit;
} address;
char state[2];
int zip;
};
Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer> or std::set<Customer>.
With that in mind, think of your CSV handling as two operations:
// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
CSVReader(const std::string &inputFile);
bool hasNextLine();
void readNextLine(std::vector<std::string> &fields);
private:
/* secrets */
};
class CSVWriter {
public:
CSVWriter(const std::string &outputFile);
void writeNextLine(const std::vector<std::string> &fields);
private:
/* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);
Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:
Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
Loading the CSV file
Representing the file in memory so that you can modify it and read it
Saving the CSV file back to disk
So you are looking at writing a CSVDocument class that contains:
Load(const char* file);
Save(const char* file);
GetBody
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
Using boost tokenizer to parse records, see here for more details.
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
/// do something with the record
if (vec.size() < 3) continue;
copy(vec.begin(), vec.end(),
ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
I found this interesting approach:
CSV to C structure utility
Quote:
CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.

C/C++ reading and writing long strings to files

I have a list of cities that I'm formatting like this:
{town, ...},
{...},
...
Reading and building each town and creating town1, town2,.... works
The problem is when I output it, 1st line works {town, ...}, but the second line crashes.
Any idea why?
I have [region] [town] (excel table).
So each region repeats by how many towns are in it.
Each file has 1 region/town per line.
judete contains each region repeated 1 time.
AB
SD
PC
....
orase contains the towns list.
town1
town2
....
orase-index contains the region of each town
AB
AB
AB
AB
SD
SD
SD
PC
PC
...
I want an output like this {"town1", "town2", ...} and each row (row 5) contains the town that belong to the region from judete at the same row (row 5).
Here's my code:
#include<stdio.h>
#include<string.h>
char judet[100][100];
char orase[50][900000];
char oras[100], ceva[100];
void main ()
{
int i=0, nr;
FILE *judete, *index, *ORASE, *output;
judete = fopen("judete.txt", "rt");
index = fopen("orase-index.txt", "rt");
ORASE = fopen("orase.txt", "rt");
output = fopen("output.txt", "wt");
while( !feof(judete) )
{
fgets(judet[i], 100, judete);
i++;
}
nr = i;
char tmp[100];
int where=0;
for(i=0;i<nr;i++)
strcpy(orase[i],"");
while( !feof(index) )
{
fgets(tmp, 100, index);
for(i=0;i<nr;i++)
{
if( strstr(judet[i], tmp) )
{
fgets(oras, 100, ORASE);
strcat(ceva, "\"");
oras[strlen(oras)-1]='\0';
strcat(ceva, oras);
strcat(ceva, "\", ");
strcat(orase[i], ceva);
break;
}
}
}
char out[900000];
for(i=0;i<nr;i++)
{
strcpy(out, "");
strcat(out, "{");
strcat(out, orase[i]); //fails here
fprintf(output, "%s},\n", out);
}
}
The result I get from running the code is:
Unhandled exception at 0x00D4F7A9 (msvcr110d.dll) in orase-judete.exe: 0xC0000005: Access violation writing location 0x00A90000.
You don't clear orase array, beacause your loop
for(i-0;i<nr;i++)
strcpy(orase[i],"");
by mistake ('-' instead of '=') executes 0 times.
I think you need to start by making up your mind whether you're writing C or C++. You've tagged this with both, but the code looks like it's pure C. While a C++ compiler will accept most C, the result isn't what most would think of as ideal C++.
Since you have tagged it as C++, I'm going to assume you actually want (or all right with) C++ code. Well written C++ code is going to be enough different from your current C code that it's probably easier to start over than try to rewrite the code line by line or anything like that.
The immediate problem I see with doing that, however, is that you haven't really specified what you want as your output. For the moment I'm going to assume you want each line of output to be something like this: "{" <town> "," <town> "}".
If that's the case, I'd start by noting that the output doesn't seem to depend on your judete file at all. The orase and orase-index seem to be entirely adequate. For that, our code can look something like this:
#include <iostream>
#include <string>
#include <iterator>
#include <fstream>
#include <vector>
// a class that overloads `operator>>` to read a line at a time:
class line {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, line &l) {
return std::getline(is, l.data);
}
operator std::string() const { return data; }
};
int main() {
// open the input files:
std::ifstream town_input("orase.txt");
std::ifstream region_input("orase-index.txt");
// create istream_iterator's to read from the input files. Note
// that these iterate over `line`s, (i.e., objects of the type
// above, so they use its `operator>>` to read each data item).
//
std::istream_iterator<line> regions(region_input),
towns(town_input),
end;
// read in the lists of towns and regions:
std::vector<std::string> town_list {towns, end};
std::vector<std::string> region_list {regions, end};
// write out the file of town-name, region-name:
std::ofstream result("output.txt");
for (int i=0; i<town_list.size(); i++)
result << "{" << town_list[i] << "," << region_list[i] << "}\n";
}
Noe that since this is C++, you'll typically need to save the source as something.cpp instead of something.c for the compiler to recognize it correctly.
Edit: Based on the new requirements you've given in the comments, you apparently want something closer to this:
#include <iostream>
#include <string>
#include <iterator>
#include <fstream>
#include <vector>
#include <map>
// a class that overloads `operator>>` to read a line at a time:
class line {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, line &l) {
return std::getline(is, l.data);
}
operator std::string() const { return data; }
};
int main() {
// open the input files:
std::ifstream town_input("orase.txt");
std::ifstream region_input("orase-index.txt");
// create istream_iterator's to read from the input files. Note
// that these iterate over `line`s, (i.e., objects of the type
// above, so they use its `operator>>` to read each data item).
//
std::istream_iterator<line> regions(region_input),
towns(town_input),
end;
// read in the lists of towns and regions:
std::vector<std::string> town_list (towns, end);
std::vector<std::string> region_list (regions, end);
// consolidate towns per region:
std::map<std::string, std::vector<std::string> > consolidated;
for (int i = 0; i < town_list.size(); i++)
consolidated[region_list[i]].push_back(town_list[i]);
// write out towns by region
std::ofstream output("output.txt");
for (auto pos = consolidated.begin(); pos != consolidated.end(); ++pos) {
std::cout << pos->first << ": ";
std::copy(pos->second.begin(), pos->second.end(),
std::ostream_iterator<std::string>(output, "\t"));
std::cout << "\n";
}
}
Notice that ceva is never initialized.
Instead of using strcpy to initialize strings, I would recommend using static initialization:
char ceva[100]="";

How can I read and manipulate CSV file data in C++? [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 8 years ago.
Pretty self-explanatory, I tried google and got a lot of the dreaded expertsexchange, I searched here as well to no avail. An online tutorial or example would be best. Thanks guys.
More information would be useful.
But the simplest form:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
int main()
{
std::ifstream data("plop.csv");
std::string line;
while(std::getline(data,line))
{
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream,cell,','))
{
// You have a cell!!!!
}
}
}
Also see this question: CSV parser in C++
You can try the Boost Tokenizer library, in particular the Escaped List Separator
If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:
struct Customer {
int id;
std::string first_name;
std::string last_name;
struct {
std::string street;
std::string unit;
} address;
char state[2];
int zip;
};
Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer> or std::set<Customer>.
With that in mind, think of your CSV handling as two operations:
// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
CSVReader(const std::string &inputFile);
bool hasNextLine();
void readNextLine(std::vector<std::string> &fields);
private:
/* secrets */
};
class CSVWriter {
public:
CSVWriter(const std::string &outputFile);
void writeNextLine(const std::vector<std::string> &fields);
private:
/* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);
Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:
Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.
I've worked with a lot of CSV files in my time. I'd like to add the advice:
1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".
2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.
3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".
4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.
This is a good exercise for yourself to work on :)
You should break your library into three parts
Loading the CSV file
Representing the file in memory so that you can modify it and read it
Saving the CSV file back to disk
So you are looking at writing a CSVDocument class that contains:
Load(const char* file);
Save(const char* file);
GetBody
So that you may use your library like this:
CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();
CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
CSVDocumentField* col = header->GetField(i);
cout << col->GetText() << "\t";
}
for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
CSVDocumentRow* row = body->GetRow(i);
for (int p = 0; p < row->GetFieldCount(); p++)
{
cout << row->GetField(p)->GetText() << "\t";
}
cout << "\n";
}
body->GetRecord(10)->SetText("hello world");
CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");
doc->Save("file.csv");
Which gives us the following interfaces:
class CSVDocument
{
public:
void Load(const char* file);
void Save(const char* file);
CSVDocumentBody* GetBody();
};
class CSVDocumentBody
{
public:
int GetRowCount();
CSVDocumentRow* GetRow(int index);
CSVDocumentRow* AddRow();
};
class CSVDocumentRow
{
public:
int GetFieldCount();
CSVDocumentField* GetField(int index);
CSVDocumentField* AddField(int index);
};
class CSVDocumentField
{
public:
const char* GetText();
void GetText(const char* text);
};
Now you just have to fill in the blanks from here :)
Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.
:)
EDIT
I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.
Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
std::fstream file("file.csv", std::ios::in);
if(!file.is_open()){
std::cout << "File not found!\n";
return 1;
}
CSVDatabase db;
readCSV(file, db);
display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
String csvLine;
// read every line from the stream
while( std::getline(input, csvLine) ){
std::istringstream csvStream(csvLine);
CSVRow csvRow;
String csvCol;
// read every element from the line that is seperated by commas
// and put it into the vector or strings
while( std::getline(csvStream, csvCol, ',') )
csvRow.push_back(csvCol);
db.push_back(csvRow);
}
}
void display(const CSVRow& row){
if(!row.size())
return;
CSVRowCI i=row.begin();
std::cout<<*(i++);
for(;i != row.end();++i)
std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
if(!db.size())
return;
CSVDatabaseCI i=db.begin();
for(; i != db.end(); ++i){
display(*i);
std::cout<<std::endl;
}
}
Using boost tokenizer to parse records, see here for more details.
ifstream in(data.c_str());
if (!in.is_open()) return 1;
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
vector< string > vec;
string line;
while (getline(in,line))
{
Tokenizer tok(line);
vec.assign(tok.begin(),tok.end());
/// do something with the record
if (vec.size() < 3) continue;
copy(vec.begin(), vec.end(),
ostream_iterator<string>(cout, "|"));
cout << "\n----------------------" << endl;
}
Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.
(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)
I found this interesting approach:
CSV to C structure utility
Quote:
CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.
Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.