c++ split string by delimiter into char array - c++

I have a file with lines in the format:
firstword;secondword;4.0
I need to split the lines by ;, store the first two words in char arrays, and store the number as a double.
In Python, I would just use split(";"), then split("") on the first two indexes of the resulting list then float() on the last index. But I don't know the syntax for doing this in C++.
So far, I'm able to read from the file and store the lines as strings in the studentList array. But I don't know where to begin with extracting the words and numbers from the items in the array. I know I would need to declare new variables to store them in, but I'm not there yet.
I don't want to use vectors for this.
#include <iomanip>
#include <fstream>
#include <string>
#include <stdlib.h>
#include <iostream>
using namespace std;
int main() {
string studentList[4];
ifstream file;
file.open("input.txt");
if(file.is_open()) {
for (int i = 0; i < 4; i++) {
file >> studentList[i];
}
file.close();
}
for(int i = 0; i < 4; i++) {
cout << studentList[i];
}
return 0;
}

you can use std::getline which support delimiter
#include <string>
#include <sstream>
#include <iostream>
int main() {
std::istringstream file("a;b;1.0\nc;d;2.0");
for (int i = 0; i < 2; i++){
std::string x,y,v;
std::getline(file,x,';');
std::getline(file,y,';');
std::getline(file,v); // default delim is new line
std::cout << x << ' ' << y << ' ' << v << '\n';
}
}

C++ uses the stream class as its string-handling workhorse. Every kind of transformation is typically designed to work through them. For splitting strings, std::getline() is absolutely the right tool. (And possibly a std::istringstream to help out.)
A few other pointers as well.
Use struct for related information
Here we have a “student” with three related pieces of information:
struct Student {
std::string last_name;
std::string first_name;
double gpa;
};
Notice how one of those items is not a string.
Keep track of the number of items used in an array
Your arrays should have a maximum (allocated) size, plus a separate count of the items used.
constexpr int MAX_STUDENTS = 100;
Student studentList[MAX_STUDENTS];
int num_students = 0;
When adding an item (to the end), remember that in C++ arrays always start with index 0:
if (num_students < MAX_STUDENTS) {
studentList[num_students].first_name = "James";
studentList[num_students].last_name = "Bond";
studentList[num_students].gpa = 4.0;
num_students += 1;
}
You can avoid some of that bookkeeping by using a std::vector:
std::vector <Student> studentList;
studentList.emplace_back( "James", "Bond", 4.0 );
But as you requested we avoid them, we’ll stick with arrays.
Use a stream extractor function overload to read a struct from stream
The input stream is expected to have student data formatted as a semicolon-delimited record — that is: last name, semicolon, first name, semicolon, gpa, newline.
std::istream & operator >> ( std::istream & ins, Student & student ) {
ins >> std::ws; // skip any leading whitespace
getline( ins, student.last_name, ';' ); // read last_name & eat delimiter
getline( ins, student.first_name, ';' ); // read first_name & eat delimiter
ins >> student.gpa; // read gpa. Does not eat delimiters
ins >> std::ws; // skip all trailing whitespace (including newline)
return ins;
}
Notice how std::getline() was put to use here to read strings terminating with a semicolon. Everything else must be either:
read as a string then converted to the desired type, or
read using the >> operator and have the delimiter specifically read.
For example, if the GPA were not last in our list, we would have to read and discard (“eat”) a semicolon:
char c;
ins >> student.gpa >> c;
if (c != ';') ins.setstate( std::ios::failbit );
Yes, that is kind of long and obnoxious. But it is how C++ streams work.
Fortunately with our current Student structure, we can eat that trailing newline along with all other whitespace.
Now we can easily read a list of students until the stream indicates EOF (or any error):
while (f >> studentList[num_students]) {
num_students += 1;
if (num_students == MAX_STUDENTS) break; // don’t forget to watch your bounds!
}
Use a stream insertion function overload to write
’Nuff said.
std::ostream & operator << ( std::ostream & outs, const Student & student ) {
return outs
<< student.last_name << ";"
<< student.first_name << ";"
<< std::fixed << std::setprecision(1) << student.gpa << "\n";
}
I am personally disinclined to modify stream characteristics on argument streams, and would instead use an intermediary std::ostreamstream:
std::ostringstream oss;
oss << std::fixed << std::setprecision(1) << student.gpa;
outs << oss.str() << "\n";
But that is beyond the usual examples, and is often unnecessary. Know your data.
Either way you can now write the list of students with a simple << in a loop:
for (int n = 0; n < num_students; n++)
f << studentList[n];
Use streams with C++ idioms
You are typing too much. Use C++’s object storage model to your advantage. Curly braces (for compound statements) help tremendously.
While you are at it, name your input files as descriptively as you are allowed.
{
std::ifstream f( "students.txt" );
while (f >> studentList[num_students])
if (++num_students == MAX_STUDENTS)
break;
}
No students will be read if f does not open. Reading will stop once you run out of students (or some error occurs) or you run out of space in the array, whichever comes first. And the file is automatically closed and the f object is destroyed when we hit that final closing brace, which terminates the lexical context containing it.
Include only required headers
Finally, try to include only those headers you actually use. This is something of an acquired skill, alas. It helps when you are beginning to list those things you are including them for right alongside the directive.
Putting it all together into a working example
#include <algorithm> // std::sort
#include <fstream> // std::ifstream
#include <iomanip> // std::setprecision
#include <iostream> // std::cin, std::cout, etc
#include <string> // std::string
struct Student {
std::string last_name;
std::string first_name;
double gpa;
};
std::istream & operator >> ( std::istream & ins, Student & student ) {
ins >> std::ws; // skip any leading whitespace
getline( ins, student.last_name, ';' ); // read last_name & eat delimiter
getline( ins, student.first_name, ';' ); // read first_name & eat delimiter
ins >> student.gpa; // read gpa. Does not eat delimiters
ins >> std::ws; // skip all trailing whitespace (including newline)
return ins;
}
std::ostream & operator << ( std::ostream & outs, const Student & student ) {
return outs
<< student.last_name << ";"
<< student.first_name << ";"
<< std::fixed << std::setprecision(1) << student.gpa << "\n";
}
int main() {
constexpr int MAX_STUDENTS = 100;
Student studentList[MAX_STUDENTS];
int num_students = 0;
// Read students from file
std::ifstream f( "students.txt" );
while (f >> studentList[num_students])
if (++num_students == MAX_STUDENTS)
break;
// Sort students by GPA from lowest to highest
std::sort( studentList, studentList+num_students,
[]( auto a, auto b ) { return a.gpa < b.gpa; } );
// Print students
for(int i = 0; i < num_students; i++) {
std::cout << studentList[i];
}
}
The “students.txt” file contains:
Blackfoot;Lawrence;3.7
Chén;Junfeng;3.8
Gupta;Chaya;4.0
Martin;Anita;3.6
Running the program produces the output:
Martin;Anita;3.6
Blackfoot;Lawrence;3.7
Chén;Junfeng;3.8
Gupta;Chaya;4.0
You can, of course, print the students any way you wish. This example just prints them with the same semicolon-delimited-format as they were input. Here we print them with GPA and surname only:
for (int n = 0; n < num_students; n++)
std::cout << studentList[n].gpa << ": " << studentList[n].last_name << "\n";
Every language has its own idiomatic usage which you should learn to take advantage of.

Related

Parsing text file lines in C++

I have a txt file with data such as following:
regNumber FName Score1 Score2 Score3
385234 John Snow 90.0 56.0 60.8
38345234 Michael Bolton 30.0 26.5
38500234 Tim Cook 40.0 56.5 20.2
1547234 Admin__One 10.0
...
The data is separated only by whitespace.
Now, my issue is that as some of the data is missing, I cannot simply do as following:
ifstream file;
file.open("file.txt")
file >> regNo >> fName >> lName >> score1 >> score2 >> score3
(I'm not sure if code above is right, but trying to explain the idea)
What I want to do is roughly this:
cout << "Reg Number: ";
cin >> regNo;
cout << "Name: ";
cin >> name;
if(regNo == regNumber && name == fname) {
cout << "Access granted" << endl;
}
This is what I've tried/where I'm at:
ifstream file;
file.open("users.txt");
string line;
while(getline(file, line)) {
stringstream ss(line);
string word;
while(ss >> word) {
cout << word << "\t";
}
cout << " " << endl;
}
I can output the file entirely, my issue is when it comes to picking the parts, e.g. only getting the regNumber or the name.
I would read the whole line in at once and then just substring it (since you suggest that these are fixed width fields)
Handling the spaces between the words of the names are tricky, but its apparent from your file that each column starts at a fixed offset. You can use this to extract the information you want. For example, in order to read the names, you can read the line starting at the offset that FName starts, and ending at the offset that Score1 starts. Then you can remove trailing white spaces from the string like this:
string A = "Tim Cook ";
auto index = A.find_last_not_of(' ');
A.erase(index + 1);
Alright, I can’t sleep and so decided to go bonkers and demonstrate just how tricky input is, especially when you have freeform data. The following code contains plenty of commentary on reading freeform data that may be missing.
#include <ciso646>
#include <deque>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <optional>
#include <sstream>
#include <string>
#include <type_traits>
#include <vector>
// Useful Stuff
template <typename T> T& lvalue( T&& arg ) { return arg; }
using strings = std::deque <std::string> ;
auto split( const std::string& s )
{
return strings
(
std::istream_iterator <std::string> ( lvalue( std::istringstream{ s } ) ),
std::istream_iterator <std::string> ()
);
}
template <typename T>
auto string_to( const std::string & s )
{
T value;
std::istringstream ss( s );
return ((ss >> value) and (ss >> std::ws).eof())
? value
: std::optional<T> { };
}
std::string trim( const std::string& s )
{
auto R = s.find_last_not_of ( " \f\n\r\t\v" ) + 1;
auto L = s.find_first_not_of( " \f\n\r\t\v" );
return s.substr( L, R-L );
}
// Each record is stored as a “User”.
// “Users” is a complete dataset of records.
struct User
{
int regNumber;
std::vector <std::string> names;
std::vector <double> scores;
};
using Users = std::vector <User> ;
// This is stuff you would put in the .cpp file, not an .hpp file.
// But since this is a single-file example, it goes here.
namespace detail::Users
{
static const char * FILE_HEADER = "regNumber FName Score1 Score2 Score3\n";
static const int REGNUMBER_WIDTH = 11;
static const int NAMES_TOTAL_WIDTH = 18;
static const int SCORE_WIDTH = 9;
static const int SCORE_PRECISION = 1;
}
// Input is always the hardest part, and provides a WHOLE lot of caveats to deal with.
// Let us take them one at a time.
//
// Each user is a record composed of ONE OR MORE elements on a line of text.
// The elements are:
// (regNumber)? (name)* (score)*
//
// The way we handle this is:
// (1) Read the entire line
// (2) Split the line into substrings
// (3) If the first element is a regNumber, grab it
// (4) Grab any trailing floating point values as scores
// (5) Anything remaining must be names
//
// There are comments in the code below which enable you to produce a hard failure
// if any record is incorrect, however you define that. A “hard fail” sets the fail
// state on the input stream, which will stop all further input on the stream until
// the caller uses the .clear() method on the stream.
//
// The default action is to stop reading records if a failure occurs. This way the
// CALLER can decide whether to clear the error and try to read more records.
//
// Finally, we use decltype liberally to make it easier to modify the User struct
// without having to watch out for type problems with the stream extraction operator.
// Input a single record
std::istream& operator >> ( std::istream& ins, User& user )
{
// // Hard fail helper (named lambda)
// auto failure = [&ins]() -> std::istream&
// {
// ins.setstate( std::ios::failbit );
// return ins;
// };
// You should generally clear your target object when writing stream extraction operators
user = User{};
// Get a single record (line) from file
std::string s;
if (!getline( ins, s )) return ins;
// Split the record into fields
auto fields = split( s );
// Skip (blank lines) and (file headers)
static const strings header = split( detail::Users::FILE_HEADER );
if (fields.empty() or fields == header) return operator >> ( ins, user );
// The optional regNumber must appear first
auto reg_number = string_to <decltype(user.regNumber)> ( fields.front() );
if (reg_number)
{
user.regNumber = *reg_number;
fields.pop_front();
}
// Optional scores must appear last
while (!fields.empty())
{
auto score = string_to <std::remove_reference <decltype(user.scores.front())> ::type> ( fields.back() );
if (!score) break;
user.scores.insert( user.scores.begin(), *score );
fields.pop_back();
}
// if (user.scores.size() > 3) return failure(); // is there a maximum number of scores?
// Any remaining fields are names.
// if (fields.empty()) return failure(); // at least one name required?
// if (fields.size() > 2) return failure(); // maximum of two names?
for (const auto& name : fields)
{
// (You could also check that each name matches a valid regex pattern, etc)
user.names.push_back( name );
}
// If we got this far, all is good. Return the input stream.
return ins;
}
// Input a complete User dataset
std::istream& operator >> ( std::istream& ins, Users& users )
{
// This time, do NOT clear the target object! This permits the caller to read
// multiple files and combine them! The caller is also now responsible to
// provide a new/empty/clear target Users object to avoid combining datasets.
// Read all records
User user;
while (ins >> user) users.push_back( user );
// Return the input stream
return ins;
}
// Output, by comparison, is fabulously easy.
//
// I won’t bother to explain any of this, except to recall that
// the User is stored as a line-object record -- that is, it must
// be terminated by a newline. Hence we output the newline in the
// single User stream insertion operator (output operator) instead
// of the Users output operator.
// Output a single User record
std::ostream& operator << ( std::ostream& outs, const User& user )
{
std::ostringstream userstring;
userstring << std::setw( detail::Users::REGNUMBER_WIDTH ) << std::left << user.regNumber;
std::ostringstream names;
for (const auto& name : user.names) names << name << " ";
userstring << std::setw( detail::Users::NAMES_TOTAL_WIDTH ) << std::left << names.str();
for (auto score : user.scores)
userstring
<< std::left << std::setw( detail::Users::SCORE_WIDTH )
<< std::fixed << std::setprecision( detail::Users::SCORE_PRECISION )
<< score;
return outs << trim( userstring.str() ) << "\n"; // <-- output of newline
}
// Output a complete User dataset
std::ostream& operator << ( std::ostream& outs, const Users& users )
{
outs << detail::Users::FILE_HEADER;
for (const auto& user : users) outs << user;
return outs;
}
int main()
{
// Example input. Notice that any field may be absent.
std::istringstream input(
"regNumber FName Score1 Score2 Score3 \n"
"385234 John Snow 90.0 56.0 60.8 \n"
"38345234 Michael Bolton 30.0 26.5 \n"
"38500234 Tim Cook 40.0 56.5 20.2 \n"
"1547234 Admin__One 10.0 \n"
" \n" // blank line --> skipped
" Jon Bon Jovi \n"
"11111 22.2 \n"
" 33.3 \n"
"4444 \n"
"55 Justin Johnson \n"
);
Users users;
input >> users;
std::cout << users;
}
To compile with MSVC:
cl /EHsc /W4 /Ox /std:c++17 a.cpp
To compile with Clang:
clang++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 a.cpp
To compile with MinGW/GCC/etc use the same as Clang, substituting g++ for clang++, naturally.
As a final note, if you can make your data file much more strict life will be significantly easier. For example, if you can say that you are always going to used fixed-width fields you can use Shahriar’s answer, for example, or pm100’s answer, which I upvoted.
I would define a Person class.
This knows how to read and write a Person on one line.
class Person
{
int regNumber;
std::string FName;
std::array<float,3> scope;
friend std::ostream& operator<<(std::ostream& s, Person const& p)
{
return p << regNumber << " " << FName << " " << scope[0] << " " << scope[1] << " " << scope[2] << "\n";
}
friend std::istream& operator>>(std::istream& s, Person& p)
{
std::string line;
std::getline(s, line);
bool valid = true;
Person tmp; // Holds value while we check
// Handle Line.
// Handle missing data.
// And update tmp to the correct state.
if (valid) {
// The conversion worked.
// So update the object we are reading into.
swap(p, tmp);
}
else {
// The conversion failed.
// Set the stream to bad so we stop reading.
s.setstate(std::ios::bad);
}
return s;
}
void swap(Person& other) noexcept
{
using std::swap;
swap(regNumber, other.regNumber);
swap(FName, other.FName);
swap(scope, other.scope);
}
};
Then your main becomes much simpler.
int main()
{
std::ifstream file("Data");
Person person;
while (file >> person)
{
std::cout << person;
}
}
It also becomes easier to handle your second part.
You load each person then ask the Person object to validate that credentials.
class Person
{
// STUFF From before:
public:
bool validateUser(int id, std::string const& name) const
{
return id == regNumber && name == FName;
}
};
int main()
{
int reg = getUserReg();
std::string name = getUserName();
std::ifstream file("Data");
Person person;
while (file >> person)
{
if (person.validateUser(reg, name))
{
std::cout << "Access Granted\n";
}
}
}

How to read CSV data to pointers of struct vector in C++?

I want to read a csv data to vector of struct in cpp, This is what I wrote, I want to store the iris dataset in pointer of struct vector csv std::vector<Csv> *csv = new std::vector<Csv>;
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
struct Csv{
float a;
float b;
float c;
float d;
std::string e;
};
int main(){
std::string colname;
// Iris csv dataset downloaded from
// https://gist.github.com/curran/a08a1080b88344b0c8a7
std::ifstream *myFile = new std::ifstream("iris.csv");
std::vector<Csv> *csv = new std::vector<Csv>;
std::string line;
// Read the column names
if(myFile->good())
{
// Extract the first line in the file
std::getline(*myFile, line);
// Create a stringstream from line
std::stringstream ss(line);
// Extract each column name
while(std::getline(ss, colname, ',')){
std::cout<<colname<<std::endl;
}
}
// Read data, line by line
while(std::getline(*myFile, line))
{
// Create a stringstream of the current line
std::stringstream ss(line);
}
return 0;
}
I dont know how to implement this part of the code which outputs line with both float and string.
// Read data, line by line
while(std::getline(*myFile, line))
{
// Create a stringstream of the current line
std::stringstream ss(line);
}
Evolution
We start with you program and complete it with your current programm style. Then we analyze your code and refactor it to a more C++ style solution. In the end we show a modern C++ solution using more OO methods.
First your completed code:
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
struct Csv {
float a;
float b;
float c;
float d;
std::string e;
};
int main() {
std::string colname;
// Iris csv dataset downloaded from
// https://gist.github.com/curran/a08a1080b88344b0c8a7
std::ifstream* myFile = new std::ifstream("r:\\iris.csv");
std::vector<Csv>* csv = new std::vector<Csv>;
std::string line;
// Read the column names
if (myFile->good())
{
// Extract the first line in the file
std::getline(*myFile, line);
// Create a stringstream from line
std::stringstream ss(line);
// Extract each column name
while (std::getline(ss, colname, ',')) {
std::cout << colname << std::endl;
}
}
// Read data, line by line
while (std::getline(*myFile, line))
{
// Create a stringstream of the current line
std::stringstream ss(line);
// Extract each column
std::string column;
std::vector<std::string> columns{};
while (std::getline(ss, column, ',')) {
columns.push_back(column);
}
// Convert
Csv csvTemp{};
csvTemp.a = std::stod(columns[0]);
csvTemp.b = std::stod(columns[1]);
csvTemp.c = std::stod(columns[2]);
csvTemp.d = std::stod(columns[3]);
csvTemp.e = columns[4];
// STore new row data
csv->push_back(csvTemp);
}
// Show everything
for (const Csv& row : *csv)
std::cout << row.a << '\t' << row.b << '\t' << row.c << '\t' << row.d << '\t' << row.e << '\n';
return 0;
}
The question that you have regarding the reading of the columns from your Csv file, can be answered like that:
You need a temporary vector. Then you use the std::getline function, to split the data in the std::istringstream and to copy the resulting substrings into the vector. After that, we use string conversion functions and assign the rsults in a temporary Csv struct variable. After all conversions have been done, we move the temporary into the resulting csv vector that holds all row data.
Analysis of the program.
First, and most important, in C++ we do not use raw pointers for owned memory. We should ven not use new in most case. If at all, std::unique_ptrand std::make_unique should be used.
But we do not need dynamic memory allocation on the heap at all. You can simply define the std::vector on the functions stack. Same like in your line std::string colname; you can also define the std::vector and the std::ifstream as a normal local variable. Like for example std::vector<Csv> csv{};. Only, if you pass this variable to another function, then use pointers, but smart pointers.
Next, if you open a file, like in std::ifstream myFile("r:\\iris.csv"); you do not need to test the file streams condition with if (myFile->good()). The std::fstreams bool operator is overwritten, to give you exactly this information. Please see here.
Now, next and most important.
The structure of your source file is well known. There is a header with 5 elements and then 4 doubles and at then end a string without spaces. This makes life very easy.
If we would need to validate the input or if there would be spaces within an string, then we would need to implement other methods. But with this structure, we can use the build in iostream facilities. The snippet
// Read all data
Csv tmp{};
char comma;
while (myFile >> tmp.a >> comma >> tmp.b >> comma >> tmp.c >> comma >> tmp.d >> comma >> tmp.e)
csv.push_back(std::move(tmp));
will do the trick. Very simple.
So, the refactored solution could look like this:
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
struct Csv {
float a;
float b;
float c;
float d;
std::string e;
};
int main() {
std::vector<Csv> csv{};
std::ifstream myFile("r:\\iris.csv");
if (myFile) {
if (std::string header{}; std::getline(myFile, header)) std::cout << header << '\n';
// Read all data
Csv tmp{};
char comma;
while (myFile >> tmp.a >> comma >> tmp.b >> comma >> tmp.c >> comma >> tmp.d >> comma >> tmp.e)
csv.push_back(std::move(tmp));
// Show everything
for (const Csv& row : csv)
std::cout << row.a << '\t' << row.b << '\t' << row.c << '\t' << row.d << '\t' << row.e << '\n';
}
return 0;
}
This is already much more compact. But there is more . . .
In the next step, we want to add a more Object Oriented approch.
The key is that data and methods, operating on this data, should be encapsulated in an Object / class / struct. Only the Csv struct should know, how to read and write its data.
Hence, we overwrite the extractor and inserter operator for the Csv struct. We use the same approach than before. We just encapsulate the reading and writing in the struct Csv.
After that, the main function will be even more compact and the usage is more logical.
Now we have:
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
struct Csv {
float a;
float b;
float c;
float d;
std::string e;
friend std::istream& operator >> (std::istream& is, Csv& c) {
char comma;
return is >> c.a >> comma >> c.b >> comma >> c.c >> comma >> c.d >> comma >> c.e;
}
friend std::ostream& operator << (std::ostream& os, const Csv& c) {
return os << c.a << '\t' << c.b << '\t' << c.c << '\t' << c.d << '\t' << c.e << '\n';
}
};
int main() {
std::vector<Csv> csv{};
if (std::ifstream myFileStream("r:\\iris.csv"); myFileStream) {
if (std::string header{}; std::getline(myFileStream, header)) std::cout << header << '\n';
// Read all data
Csv tmp{};
while (myFileStream >> tmp)
csv.push_back(std::move(tmp));
// Show everything
for (const Csv& row : csv)
std::cout << row;
}
return 0;
}
OK. Alread rather good. Bit there is even more possible.
We can see that the source data has a header and then Csv data.
Also this can be modelled into a struct. We call it Iris. And we also add an extractor and inserter overwrite to encapsulate all IO-operations.
Additionally we use now modern algorithms, regex, and IO-iterators. I am not sure, if this is too complex now. If you are interested, then I can give you further information. But for now, I will just show you the code.
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <regex>
#include <iterator>
const std::regex re{ "," };
struct Csv {
float a;
float b;
float c;
float d;
std::string e;
// Overwrite extratcor for simple reading of data
friend std::istream& operator >> (std::istream& is, Csv& c) {
char comma;
return is >> c.a >> comma >> c.b >> comma >> c.c >> comma >> c.d >> comma >> c.e;
}
// Ultra simple inserter
friend std::ostream& operator << (std::ostream& os, const Csv& c) {
return os << c.a << "\t\t" << c.b << "\t\t" << c.c << "\t\t" << c.d << "\t\t" << c.e << '\n';
}
};
struct Iris {
// Iris data consits of header and then Csv Data
std::vector<std::string> header{};
std::vector<Csv> csv{};
// Overwrite extractor for generic reading from streams
friend std::istream& operator >> (std::istream& is, Iris& i) {
// First read header values;
if (std::string line{}; std::getline(is, line))
std::copy(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {}, std::back_inserter(i.header));
// Read all csv data
std::copy(std::istream_iterator<Csv>(is), {}, std::back_inserter(i.csv));
return is;
}
// Simple output. Copy data to stream os
friend std::ostream& operator << (std::ostream& os, const Iris& i) {
std::copy(i.header.begin(), i.header.end(), std::ostream_iterator<std::string>(os, "\t")); std::cout << '\n';
std::copy(i.csv.begin(), i.csv.end(), std::ostream_iterator<Csv>(os));
return os;
}
};
// Driver Code
int main() {
if (std::ifstream myFileStream("r:\\iris.csv"); myFileStream) {
Iris iris{};
// Read all data
myFileStream >> iris;
// SHow result
std::cout << iris;
}
return 0;
}
Look at the main function and how easy it is.
If you have questions, then please ask.
Language: C++17
Compiled and tested with MS Visual Studio 2019, community edition

Reading .txt file into array of struct

I'm a beginner in programming and i'm trying to read my .txt file into an array of struct in this program which after that display the data and then sort it, but the program only reads the first line and the loop won't stop until arraysize.
The file data looks like this:
ID NAME ADDRESS AGE
The Code:
#include <iostream>
#include <fstream>
#include <string>
#include <conio.h>
using namespace std;
struct bio
{
char name[50], address[50];
int id, age;
};
int main()
{
int i = 0, arraysize = 1000;
bio b[arraysize];
fstream data;
data.open("biodata.txt");
while(data.read((char*)&b, sizeof(b[i])))
{
for (i = 1; i < 1000; i++)
{
data >> b[i].id >> b[i].name >> b[i].address >> b[i].age;
}
}
for (i = 0; i < 1000; i++)
{
cout << b[i].id << " " << b[i].name << " " << b[i].address << " " << b[i].age << " " << endl;
}
data.close();
getch();
}
#include <iostream>
#include <fstream>
#include <string>
#define ARRAY_SIZE 1000
#define FILE_NAME "biodata.txt"
using namespace std;
struct Bio
{
int m_id;
string m_name;
string m_address;
int m_age;
};
int main()
{
Bio bio[ARRAY_SIZE];
ifstream data;
data.open(FILE_NAME);
if (!data)
{
cout << "not file " << FILE_NAME;
return 0;
}
for (int i = 0; i < ARRAY_SIZE && data.good(); ++i)
{
data >> bio[i].m_id >> bio[i].m_name >> bio[i].m_address >> bio[i].m_age;
}
for (int i = 0; i < ARRAY_SIZE ; ++i)
{
cout << bio[i].m_id << " " << bio[i].m_name << " " << bio[i].m_address << " " << bio[i].m_age << " " << endl;
}
data.close();
}
a few comments:
for what conio lib?
struct (bio) start with capital letter
don't use in char array in c++, you have string for this.
separate the variables to separate lines (bot "char name[50], address[50];")
better to rename members to m_X
about your "arraysize". if it const number you decide, do it with #define. if you need the whole file, you don't need it at all. (the file name too)
ifstream and not fstream data. you need just read. you don't want to change your data with some mistake.
check it the file opened well
in your code you check the while just before the loop.
in your condition loop check data.good(). it check it not eof and he file is readable.
read command is for binary file
it's better to separate the load file and print data to 2 differents functions. I didn't do it for save on your template
The following is maybe a little complicated for beginners, but since we are talking about C++, we should look also to a "more" objective oriented approach.
You designed a class, called bio. In object oriented languages you will put all data for an object and also all functions that operate on this data in the class. So you need to add member functions. The idea is that you encapsulate all data in an object. The outside world should not know anything about the details of the class. You just access it via member functions. And if you want to make changes later than you will do this within the member functions of the classes. And the rest of the program will continue to work.
Additionally we should definitely use C++ language features. For examples you should use std::string for strings and not Plain old C-Style char arrays. You should basically never use C-Style arrays in C++. Instead, please use STL container.
So, then let's design a class with data members and member functions. Since at the moment we just need input and output functionality, we overwrite the inserter and extractor operator. These operators know abot the data and behaviour of the class and will take care.
See the following program:
#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <sstream>
struct Bio
{
// Data
unsigned int id{};
std::string name{};
std::string address{};
unsigned int age{};
// Overload extractor operator to read all data
friend std::istream& operator >> (std::istream& is, Bio& b) {
std::string textLine{};
if (std::getline(is, textLine)) {
std::istringstream textLineStream{textLine};
textLineStream >> b.id >> b.name >> b.address >> b.age;
}
return is;
}
// Overload inserter operator to print the data
friend std::ostream& operator << (std::ostream& os, const Bio& b) {
return os << b.id << " " << b.name << " " << b.address << " " << b.age;
}
};
std::istringstream sourceFile{R"(1 John Address1 31
2 Paul Address2 32
3 Ringo Address3 33
4 George Address4 34
)"};
int main()
{
// Define Variable and read complete source file
std::vector<Bio> bio{std::istream_iterator<Bio>(sourceFile), std::istream_iterator<Bio>()};
// Sort the guys by name
std::sort(bio.begin(), bio.end(), [](const Bio& b1, const Bio& b2){ return b1.name < b2.name;});
// Show output on screen
std::copy(bio.begin(),bio.end(),std::ostream_iterator<Bio>(std::cout, "\n"));
return 0;
}
Some comments. On StackOverflow, I cannot use files. So in my example program, I use a std::istringstream instead. But this is also an std::istream. You can use any other std::istream as well. So if you define an `````std::ifstreamto read from a file, then it will work in the same way as thestd::istringstream````.
And please see. The extractor operator does the whole work of reading the source File. It is encapsulated. No outside function needs to know, how it does.
In the main function, we define a std::vector and use its range contructor to specifiy where the data comes from. We give it the std::istream_iterator, which iterates over the input data and calls the extractor operator until verything is read.
Then we sort by names and copy the result to the output.
You may notice that fields in your input data are separted by space. This does in general not work for none atomic data. The name could exist of 2 parts and the address can have a street and a city. For this CSV (Comma separated Values) files have been invented.
Please see a more realistic soultion below.
#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <sstream>
#include <regex>
struct Bio
{
// Data
unsigned int id{};
std::string name{};
std::string address{};
unsigned int age{};
// Overload extractor operator to read all data
friend std::istream& operator >> (std::istream& is, Bio& b) {
std::string line{};
std::regex re{";"};
if (std::getline(is, line)) {
std::vector<std::string> token{std::sregex_token_iterator(line.begin(), line.end(), re, -1), std::sregex_token_iterator()};
if (4 == token.size()) {
b.id = std::stoul(token[0]);
b.name = token[1];
b.address = token[2];
b.age = std::stoul(token[3]);
}
}
return is;
}
// Overload inserter operator to print the data
friend std::ostream& operator << (std::ostream& os, const Bio& b) {
return os << b.id << ", " << b.name << ", " << b.address << ", " << b.age;
}
};
std::istringstream sourceFile{R"(1; John Lenon; Street1 City1; 31
2; Paul McCartney; Street2 City2; 32
3; Ringo Starr; Street3 City3; 33
4; George Harrison; Address4; 34
)"};
int main()
{
// Define Variable and read complete source file
std::vector<Bio> bio{std::istream_iterator<Bio>(sourceFile), std::istream_iterator<Bio>()};
// Sort the guys by name
std::sort(bio.begin(), bio.end(), [](const Bio& b1, const Bio& b2){ return b1.name < b2.name;});
// Show output on screen
std::copy(bio.begin(),bio.end(),std::ostream_iterator<Bio>(std::cout, "\n"));
return 0;
}
We have a new source format and main is unchanged. Just the extractor operator is modified. Here we are using a different iterator to get the source data.

Parsing a huge complicated CSV file using C++

I have a large CSV file which looks like this:
23456, The End is Near, A silly description that makes no sense, http://www.example.com, 45332, 5th July 1998 Sunday, 45.332
That's just one line of the CSV file. There are around 500k of these.
I want to parse this file using C++. The code I started out with is:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
// open the input csv file containing training data
ifstream inputFile("my.csv");
string line;
while (getline(inputFile, line, ','))
{
istringstream ss(line);
// declaring appropriate variables present in csv file
long unsigned id;
string url, title, description, datetaken;
float val1, val2;
ss >> id >> url >> title >> datetaken >> description >> val1 >> val2;
cout << url << endl;
}
inputFile.close();
}
The problem is that it's not printing out the correct values.
I suspect that it's not able to handle white spaces within a field. So what do you suggest I should do?
Thanks
In this example we have to parse the string using two getline. The first gets a line of cvs text getline(cin, line) useing default newline delimiter. The second getline(ss, line, ',') delimits using commas to separates the strings.
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
float get_float(const std::string& s) {
std::stringstream ss(s);
float ret;
ss >> ret;
return ret;
}
int get_int(const std::string& s) {
std::stringstream ss(s);
int ret;
ss >> ret;
return ret;
}
int main() {
std::string line;
while (getline(cin, line)) {
std::stringstream ss(line);
std::vector<std::string> v;
std::string field;
while(getline(ss, field, ',')) {
std::cout << " " << field;
v.push_back(field);
}
int id = get_int(v[0]);
float f = get_float(v[6]);
std::cout << v[3] << std::endl;
}
}
Using std::istream to read std::strings using the overloaded insertion operator is not going to work well. The entire line is a string, so it won't pick up that there is a change in fields by default. A quick fix would be to split the line on commas and assign the values to the appropriate fields (instead of using std::istringstream).
NOTE: That is in addition to jrok's point about std::getline
Within the stated constraints, I think I'd do something like this:
#include <locale>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
// A ctype that classifies only comma and new-line as "white space":
struct field_reader : std::ctype<char> {
field_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(table_size, std::ctype_base::mask());
rc[','] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
// A struct to hold one record from the file:
struct record {
std::string key, name, desc, url, zip, date, number;
friend std::istream &operator>>(std::istream &is, record &r) {
return is >> r.key >> r.name >> r.desc >> r.url >> r.zip >> r.date >> r.number;
}
friend std::ostream &operator<<(std::ostream &os, record const &r) {
return os << "key: " << r.key
<< "\nname: " << r.name
<< "\ndesc: " << r.desc
<< "\nurl: " << r.url
<< "\nzip: " << r.zip
<< "\ndate: " << r.date
<< "\nnumber: " << r.number;
}
};
int main() {
std::stringstream input("23456, The End is Near, A silly description that makes no sense, http://www.example.com, 45332, 5th July 1998 Sunday, 45.332");
// use our ctype facet with the stream:
input.imbue(std::locale(std::locale(), new field_reader()));
// read in all our records:
std::istream_iterator<record> in(input), end;
std::vector<record> records{ in, end };
// show what we read:
std::copy(records.begin(), records.end(),
std::ostream_iterator<record>(std::cout, "\n"));
}
This is, beyond a doubt, longer than most of the others -- but it's all broken into small, mostly-reusable pieces. Once you have the other pieces in place, the code to read the data is trivial:
std::vector<record> records{ in, end };
One other point I find compelling: the first time the code compiled, it also ran correctly (and I find that quite routine for this style of programming).
I have just worked out this problem for myself and am willing to share! It may be a little overkill but it shows a working example of how Boost Tokenizer & vectors handle a big problem.
/*
* ALfred Haines Copyleft 2013
* convert csv to sql file
* csv2sql requires that each line is a unique record
*
* This example of file read and the Boost tokenizer
*
* In the spirit of COBOL I do not output until the end
* when all the print lines are ouput at once
* Special thanks to SBHacker for the code to handle linefeeds
*/
#include <sstream>
#include <boost/tokenizer.hpp>
#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/algorithm/string/replace.hpp>
#include <vector>
namespace io = boost::iostreams;
using boost::tokenizer;
using boost::escaped_list_separator;
typedef tokenizer<escaped_list_separator<char> > so_tokenizer;
using namespace std;
using namespace boost;
vector<string> parser( string );
int main()
{
vector<string> stuff ; // this is the data in a vector
string filename; // this is the input file
string c = ""; // this holds the print line
string sr ;
cout << "Enter filename: " ;
cin >> filename;
//filename = "drwho.csv";
int lastindex = filename.find_last_of("."); // find where the extension begins
string rawname = filename.substr(0, lastindex); // extract the raw name
stuff = parser( filename ); // this gets the data from the file
/** I ask if the user wants a new_index to be created */
cout << "\n\nMySql requires a unique ID field as a Primary Key \n" ;
cout << "If the first field is not unique (no dupicate entries) \nthan you should create a " ;
cout << "New index field for this data.\n" ;
cout << "Not Sure! try no first to maintain data integrity.\n" ;
string ni ;bool invalid_data = true;bool new_index = false ;
do {
cout<<"Should I create a New Index now? (y/n)"<<endl;
cin>>ni;
if ( ni == "y" || ni == "n" ) { invalid_data =false ; }
} while (invalid_data);
cout << "\n" ;
if (ni == "y" )
{
new_index = true ;
sr = rawname.c_str() ; sr.append("_id" ); // new_index field
}
// now make the sql code from the vector stuff
// Create table section
c.append("DROP TABLE IF EXISTS `");
c.append(rawname.c_str() );
c.append("`;");
c.append("\nCREATE TABLE IF NOT EXISTS `");
c.append(rawname.c_str() );
c.append( "` (");
c.append("\n");
if (new_index)
{
c.append( "`");
c.append(sr );
c.append( "` int(10) unsigned NOT NULL,");
c.append("\n");
}
string s = stuff[0];// it is assumed that line zero has fieldnames
int x =0 ; // used to determine if new index is printed
// boost tokenizer code from the Boost website -- tok holds the token
so_tokenizer tok(s, escaped_list_separator<char>('\\', ',', '\"'));
for(so_tokenizer::iterator beg=tok.begin(); beg!=tok.end(); ++beg)
{
x++; // keeps number of fields for later use to eliminate the comma on the last entry
if (x == 1 && new_index == false ) sr = static_cast<string> (*beg) ;
c.append( "`" );
c.append(*beg);
if (x == 1 && new_index == false )
{
c.append( "` int(10) unsigned NOT NULL,");
}
else
{
c.append("` text ,");
}
c.append("\n");
}
c.append("PRIMARY KEY (`");
c.append(sr );
c.append("`)" );
c.append("\n");
c.append( ") ENGINE=InnoDB DEFAULT CHARSET=latin1;");
c.append("\n");
c.append("\n");
// The Create table section is done
// Now make the Insert lines one per line is safer in case you need to split the sql file
for (int w=1; w < stuff.size(); ++w)
{
c.append("INSERT INTO `");
c.append(rawname.c_str() );
c.append("` VALUES ( ");
if (new_index)
{
string String = static_cast<ostringstream*>( &(ostringstream() << w) )->str();
c.append(String);
c.append(" , ");
}
int p = 1 ; // used to eliminate the comma on the last entry
// tokenizer code needs unique name -- stok holds this token
so_tokenizer stok(stuff[w], escaped_list_separator<char>('\\', ',', '\"'));
for(so_tokenizer::iterator beg=stok.begin(); beg!=stok.end(); ++beg)
{
c.append(" '");
string str = static_cast<string> (*beg) ;
boost::replace_all(str, "'", "\\'");
// boost::replace_all(str, "\n", " -- ");
c.append( str);
c.append("' ");
if ( p < x ) c.append(",") ;// we dont want a comma on the last entry
p++ ;
}
c.append( ");\n");
}
// now print the whole thing to an output file
string out_file = rawname.c_str() ;
out_file.append(".sql");
io::stream_buffer<io::file_sink> buf(out_file);
std::ostream out(&buf);
out << c ;
// let the user know that they are done
cout<< "Well if you got here then the data should be in the file " << out_file << "\n" ;
return 0;}
vector<string> parser( string filename )
{
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
escaped_list_separator<char> sep('\\', ',', '\"');
vector<string> stuff ;
string data(filename);
ifstream in(filename.c_str());
string li;
string buffer;
bool inside_quotes(false);
size_t last_quote(0);
while (getline(in,buffer))
{
// --- deal with line breaks in quoted strings
last_quote = buffer.find_first_of('"');
while (last_quote != string::npos)
{
inside_quotes = !inside_quotes;
last_quote = buffer.find_first_of('"',last_quote+1);
}
li.append(buffer);
if (inside_quotes)
{
li.append("\n");
continue;
}
// ---
stuff.push_back(li);
li.clear(); // clear here, next check could fail
}
in.close();
//cout << stuff.size() << endl ;
return stuff ;
}
You are right to suspect that your code is not behaving as desired because the whitespace within the field values.
If you indeed have "simple" CSV where no field may contain a comma within the field value, then I would step away from the stream operators and perhaps C++ all together. The example program in the question merely re-orders fields. There is no need to actually interpret or convert the values into their appropriate types (unless validation was also a goal). Reordering alone is super easy to accomplish with awk. For example, the following command would reverse 3 fields found in a simple CSV file.
cat infile | awk -F, '{ print $3","$2","$1 }' > outfile
If the goal is really to use this code snippet as a launchpad for bigger and better ideas ... then I would tokenize the line by searching for commas. The std::string class has a built-in method to find the offsets specific characters. You can make this approach as elegant or inelegant as you want. The most elegant approaches end up looking something like the boost tokenization code.
The quick-and-dirty approach is to just to know your program has N fields and look for the positions of the corresponding N-1 commas. Once you have those positions, it is pretty straightforward to invoke std::string::substr to extract the fields of interest.

How to use C++ to read in a .csv file and output in another form?

I have a .csv file that has 3 rows and 5 columns with values of 0,1,2,3,50, or 100. I saved it from an excel sheet to a .csv file. I am trying to use C++ to read in a .csv file and output the first two column values in the .csv file into a text file based on the last three column values. I am assuming the .csv file looks like
1,1,value,value,value
1,2,value,value,value
1,3,value,value,value
But I couldn't find a whole lot of documentation on the format of .csv files.
I looked at Reading Values from fields in a .csv file? and used some of the code from there.
Here is my code:
#include <iostream>
#include <fstream>
using namespace std;
char separator;
int test_var;
struct Spaxel {
int array1;
int array2;
int red;
int blue_o2;
int blue_o3;
};
Spaxel whole_list [3];
int main()
{
// Reading in the file
ifstream myfile("sample.csv");
Spaxel data;
int n = 0;
cout << data.array1<< endl;
myfile >> data.array1; // using as a test to see if it is working
cout << data.array1<< endl;
while (myfile >> data.array1)
{
// Storing the 5 variable and getting rid of commas
cout<<"here?"<< endl;
// Skip the separator, e.g. comma (',')
myfile >> separator;
// Read in next value.
myfile >> data.array2;
// Skip the separator
myfile >> separator;
// Read in next value.
myfile >> data.red;
// Skip the separator, e.g. comma (',')
myfile >> separator;
// Read in next value.
myfile >> data.blue_o2;
// Skip the separator
myfile >> separator;
// Read in next value.
myfile >> data.blue_o3;
// Ignore the newline, as it is still in the buffer.
myfile.ignore(10000, '\n');
// Storing values in an array to be printed out later into another file
whole_list[n] = data;
cout << whole_list[n].red << endl;
n++;
}
myfile.close();
// Putting contents of whole_list in an output file
//whole_list[0].red = whole_list[0].array1 = whole_list[0].array2 = 1; this was a test and it didn't work
ofstream output("sample_out.txt");
for (int n=0; n<3; n++) {
if (whole_list[n].red == 1)
output << whole_list[n].array1 <<","<< whole_list[n].array2<< endl;
}
return 0;
}
When I run it in Xcode it prints three 0's (from the cout << data.array1<< endl; and cout << data.array1<< endl; in the beginning of the main() and from the return 0) but does not output any file. Apparently the .csv file isn't getting read in correctly and the output file is not getting written correctly. Any suggestions?
Thanks for your time!
There are a couple of problem areas in the code you presented:
Hard coded filename. Running your program in a directory that doesn't have "sample.csv" could cause the ifstream failure you're seeing.
No checking whether myfile opened successfully or not.
Loop can access an out-of-bound index in whole_list if "sample.csv" has more lines.
The refactored code below, while not completely foolproof, corrects many of the issues mentioned. It should get you most of the way there.
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
struct Spaxel
{
int array1;
int array2;
int red;
int blue_o2;
int blue_o3;
};
ostream& operator << (ostream &os, const Spaxel &rhs)
{
os << rhs.array1
<< ','
<< rhs.array2
<< ','
<< rhs.red
<< ','
<< rhs.blue_o2
<< ','
<< rhs.blue_o3;
return os;
}
istream& operator >> (istream &is, Spaxel &rhs)
{
char delim;
is >> rhs.array1
>> delim
>> rhs.array2
>> delim
>> rhs.red
>> delim
>> rhs.blue_o2
>> delim
>> rhs.blue_o3;
return is;
}
int main(int argc, const char *argv[])
{
if(argc < 2)
{
cout << "Usage: " << argv[0] << " filename\n";
return 1;
}
const char *infilename = argv[argc - 1];
// Reading in the file
ifstream myfile(infilename);
if(!myfile)
{
cerr << "Couldn't open file " << infilename;
return 1;
}
vector<Spaxel> whole_list;
string line;
while( getline(myfile, line) )
{
Spaxel data;
stringstream linestr (line);
linestr >> data;
whole_list.push_back(data);
cout << data << '\n';
}
}
Edit: Just to help clarify some things from the comment.
As you know main is the entry point of your program so it isn't something called by your own code. The extra optional parameters int argc, const char *argv[], is how options and parameters get passed in when you run your program with arguments. First parameter argc indicates how many arguments were passed in. The second argv is an array of char * with each element being the argument passed. The first argument argv[0] is your program name and so argc is always >= 1.
Say you execute your sample program from the shell:
./sample sample.csv
then argc and argv will have the following:
argc = 2;
argv[0] = "sample"
argv[1] = "sample.csv"
So const char *infilename = argv[argc - 1]; gets the last argument passed in which should be the filename to read in.
Sorry i am not doing it within struct but i hope you will got it and resolve your problem.
char separator;
int value1;
int value2;
int value3;
while (myfile >> value1)
{
// Skip the separator, e.g. comma (',')
myfile >> separator;
// Read in next value.
myfile >> value2;
// Skip the separator, e.g. comma (',')
myfile >> separator;
// Read in next value.
myfile >> value3;
// Ignore the newline, as it is still in the buffer.
myfile.ignore(10000, '\n');
}
The above code fragment is not robust but demonstrates the concept of reading from a file, skipping non-numeric separators and processing the end of the line. The code is optimized either.