Parsing text file lines in C++ - c++

I have a txt file with data such as following:
regNumber FName Score1 Score2 Score3
385234 John Snow 90.0 56.0 60.8
38345234 Michael Bolton 30.0 26.5
38500234 Tim Cook 40.0 56.5 20.2
1547234 Admin__One 10.0
...
The data is separated only by whitespace.
Now, my issue is that as some of the data is missing, I cannot simply do as following:
ifstream file;
file.open("file.txt")
file >> regNo >> fName >> lName >> score1 >> score2 >> score3
(I'm not sure if code above is right, but trying to explain the idea)
What I want to do is roughly this:
cout << "Reg Number: ";
cin >> regNo;
cout << "Name: ";
cin >> name;
if(regNo == regNumber && name == fname) {
cout << "Access granted" << endl;
}
This is what I've tried/where I'm at:
ifstream file;
file.open("users.txt");
string line;
while(getline(file, line)) {
stringstream ss(line);
string word;
while(ss >> word) {
cout << word << "\t";
}
cout << " " << endl;
}
I can output the file entirely, my issue is when it comes to picking the parts, e.g. only getting the regNumber or the name.

I would read the whole line in at once and then just substring it (since you suggest that these are fixed width fields)

Handling the spaces between the words of the names are tricky, but its apparent from your file that each column starts at a fixed offset. You can use this to extract the information you want. For example, in order to read the names, you can read the line starting at the offset that FName starts, and ending at the offset that Score1 starts. Then you can remove trailing white spaces from the string like this:
string A = "Tim Cook ";
auto index = A.find_last_not_of(' ');
A.erase(index + 1);

Alright, I can’t sleep and so decided to go bonkers and demonstrate just how tricky input is, especially when you have freeform data. The following code contains plenty of commentary on reading freeform data that may be missing.
#include <ciso646>
#include <deque>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <optional>
#include <sstream>
#include <string>
#include <type_traits>
#include <vector>
// Useful Stuff
template <typename T> T& lvalue( T&& arg ) { return arg; }
using strings = std::deque <std::string> ;
auto split( const std::string& s )
{
return strings
(
std::istream_iterator <std::string> ( lvalue( std::istringstream{ s } ) ),
std::istream_iterator <std::string> ()
);
}
template <typename T>
auto string_to( const std::string & s )
{
T value;
std::istringstream ss( s );
return ((ss >> value) and (ss >> std::ws).eof())
? value
: std::optional<T> { };
}
std::string trim( const std::string& s )
{
auto R = s.find_last_not_of ( " \f\n\r\t\v" ) + 1;
auto L = s.find_first_not_of( " \f\n\r\t\v" );
return s.substr( L, R-L );
}
// Each record is stored as a “User”.
// “Users” is a complete dataset of records.
struct User
{
int regNumber;
std::vector <std::string> names;
std::vector <double> scores;
};
using Users = std::vector <User> ;
// This is stuff you would put in the .cpp file, not an .hpp file.
// But since this is a single-file example, it goes here.
namespace detail::Users
{
static const char * FILE_HEADER = "regNumber FName Score1 Score2 Score3\n";
static const int REGNUMBER_WIDTH = 11;
static const int NAMES_TOTAL_WIDTH = 18;
static const int SCORE_WIDTH = 9;
static const int SCORE_PRECISION = 1;
}
// Input is always the hardest part, and provides a WHOLE lot of caveats to deal with.
// Let us take them one at a time.
//
// Each user is a record composed of ONE OR MORE elements on a line of text.
// The elements are:
// (regNumber)? (name)* (score)*
//
// The way we handle this is:
// (1) Read the entire line
// (2) Split the line into substrings
// (3) If the first element is a regNumber, grab it
// (4) Grab any trailing floating point values as scores
// (5) Anything remaining must be names
//
// There are comments in the code below which enable you to produce a hard failure
// if any record is incorrect, however you define that. A “hard fail” sets the fail
// state on the input stream, which will stop all further input on the stream until
// the caller uses the .clear() method on the stream.
//
// The default action is to stop reading records if a failure occurs. This way the
// CALLER can decide whether to clear the error and try to read more records.
//
// Finally, we use decltype liberally to make it easier to modify the User struct
// without having to watch out for type problems with the stream extraction operator.
// Input a single record
std::istream& operator >> ( std::istream& ins, User& user )
{
// // Hard fail helper (named lambda)
// auto failure = [&ins]() -> std::istream&
// {
// ins.setstate( std::ios::failbit );
// return ins;
// };
// You should generally clear your target object when writing stream extraction operators
user = User{};
// Get a single record (line) from file
std::string s;
if (!getline( ins, s )) return ins;
// Split the record into fields
auto fields = split( s );
// Skip (blank lines) and (file headers)
static const strings header = split( detail::Users::FILE_HEADER );
if (fields.empty() or fields == header) return operator >> ( ins, user );
// The optional regNumber must appear first
auto reg_number = string_to <decltype(user.regNumber)> ( fields.front() );
if (reg_number)
{
user.regNumber = *reg_number;
fields.pop_front();
}
// Optional scores must appear last
while (!fields.empty())
{
auto score = string_to <std::remove_reference <decltype(user.scores.front())> ::type> ( fields.back() );
if (!score) break;
user.scores.insert( user.scores.begin(), *score );
fields.pop_back();
}
// if (user.scores.size() > 3) return failure(); // is there a maximum number of scores?
// Any remaining fields are names.
// if (fields.empty()) return failure(); // at least one name required?
// if (fields.size() > 2) return failure(); // maximum of two names?
for (const auto& name : fields)
{
// (You could also check that each name matches a valid regex pattern, etc)
user.names.push_back( name );
}
// If we got this far, all is good. Return the input stream.
return ins;
}
// Input a complete User dataset
std::istream& operator >> ( std::istream& ins, Users& users )
{
// This time, do NOT clear the target object! This permits the caller to read
// multiple files and combine them! The caller is also now responsible to
// provide a new/empty/clear target Users object to avoid combining datasets.
// Read all records
User user;
while (ins >> user) users.push_back( user );
// Return the input stream
return ins;
}
// Output, by comparison, is fabulously easy.
//
// I won’t bother to explain any of this, except to recall that
// the User is stored as a line-object record -- that is, it must
// be terminated by a newline. Hence we output the newline in the
// single User stream insertion operator (output operator) instead
// of the Users output operator.
// Output a single User record
std::ostream& operator << ( std::ostream& outs, const User& user )
{
std::ostringstream userstring;
userstring << std::setw( detail::Users::REGNUMBER_WIDTH ) << std::left << user.regNumber;
std::ostringstream names;
for (const auto& name : user.names) names << name << " ";
userstring << std::setw( detail::Users::NAMES_TOTAL_WIDTH ) << std::left << names.str();
for (auto score : user.scores)
userstring
<< std::left << std::setw( detail::Users::SCORE_WIDTH )
<< std::fixed << std::setprecision( detail::Users::SCORE_PRECISION )
<< score;
return outs << trim( userstring.str() ) << "\n"; // <-- output of newline
}
// Output a complete User dataset
std::ostream& operator << ( std::ostream& outs, const Users& users )
{
outs << detail::Users::FILE_HEADER;
for (const auto& user : users) outs << user;
return outs;
}
int main()
{
// Example input. Notice that any field may be absent.
std::istringstream input(
"regNumber FName Score1 Score2 Score3 \n"
"385234 John Snow 90.0 56.0 60.8 \n"
"38345234 Michael Bolton 30.0 26.5 \n"
"38500234 Tim Cook 40.0 56.5 20.2 \n"
"1547234 Admin__One 10.0 \n"
" \n" // blank line --> skipped
" Jon Bon Jovi \n"
"11111 22.2 \n"
" 33.3 \n"
"4444 \n"
"55 Justin Johnson \n"
);
Users users;
input >> users;
std::cout << users;
}
To compile with MSVC:
cl /EHsc /W4 /Ox /std:c++17 a.cpp
To compile with Clang:
clang++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 a.cpp
To compile with MinGW/GCC/etc use the same as Clang, substituting g++ for clang++, naturally.
As a final note, if you can make your data file much more strict life will be significantly easier. For example, if you can say that you are always going to used fixed-width fields you can use Shahriar’s answer, for example, or pm100’s answer, which I upvoted.

I would define a Person class.
This knows how to read and write a Person on one line.
class Person
{
int regNumber;
std::string FName;
std::array<float,3> scope;
friend std::ostream& operator<<(std::ostream& s, Person const& p)
{
return p << regNumber << " " << FName << " " << scope[0] << " " << scope[1] << " " << scope[2] << "\n";
}
friend std::istream& operator>>(std::istream& s, Person& p)
{
std::string line;
std::getline(s, line);
bool valid = true;
Person tmp; // Holds value while we check
// Handle Line.
// Handle missing data.
// And update tmp to the correct state.
if (valid) {
// The conversion worked.
// So update the object we are reading into.
swap(p, tmp);
}
else {
// The conversion failed.
// Set the stream to bad so we stop reading.
s.setstate(std::ios::bad);
}
return s;
}
void swap(Person& other) noexcept
{
using std::swap;
swap(regNumber, other.regNumber);
swap(FName, other.FName);
swap(scope, other.scope);
}
};
Then your main becomes much simpler.
int main()
{
std::ifstream file("Data");
Person person;
while (file >> person)
{
std::cout << person;
}
}
It also becomes easier to handle your second part.
You load each person then ask the Person object to validate that credentials.
class Person
{
// STUFF From before:
public:
bool validateUser(int id, std::string const& name) const
{
return id == regNumber && name == FName;
}
};
int main()
{
int reg = getUserReg();
std::string name = getUserName();
std::ifstream file("Data");
Person person;
while (file >> person)
{
if (person.validateUser(reg, name))
{
std::cout << "Access Granted\n";
}
}
}

Related

c++ split string by delimiter into char array

I have a file with lines in the format:
firstword;secondword;4.0
I need to split the lines by ;, store the first two words in char arrays, and store the number as a double.
In Python, I would just use split(";"), then split("") on the first two indexes of the resulting list then float() on the last index. But I don't know the syntax for doing this in C++.
So far, I'm able to read from the file and store the lines as strings in the studentList array. But I don't know where to begin with extracting the words and numbers from the items in the array. I know I would need to declare new variables to store them in, but I'm not there yet.
I don't want to use vectors for this.
#include <iomanip>
#include <fstream>
#include <string>
#include <stdlib.h>
#include <iostream>
using namespace std;
int main() {
string studentList[4];
ifstream file;
file.open("input.txt");
if(file.is_open()) {
for (int i = 0; i < 4; i++) {
file >> studentList[i];
}
file.close();
}
for(int i = 0; i < 4; i++) {
cout << studentList[i];
}
return 0;
}
you can use std::getline which support delimiter
#include <string>
#include <sstream>
#include <iostream>
int main() {
std::istringstream file("a;b;1.0\nc;d;2.0");
for (int i = 0; i < 2; i++){
std::string x,y,v;
std::getline(file,x,';');
std::getline(file,y,';');
std::getline(file,v); // default delim is new line
std::cout << x << ' ' << y << ' ' << v << '\n';
}
}
C++ uses the stream class as its string-handling workhorse. Every kind of transformation is typically designed to work through them. For splitting strings, std::getline() is absolutely the right tool. (And possibly a std::istringstream to help out.)
A few other pointers as well.
Use struct for related information
Here we have a “student” with three related pieces of information:
struct Student {
std::string last_name;
std::string first_name;
double gpa;
};
Notice how one of those items is not a string.
Keep track of the number of items used in an array
Your arrays should have a maximum (allocated) size, plus a separate count of the items used.
constexpr int MAX_STUDENTS = 100;
Student studentList[MAX_STUDENTS];
int num_students = 0;
When adding an item (to the end), remember that in C++ arrays always start with index 0:
if (num_students < MAX_STUDENTS) {
studentList[num_students].first_name = "James";
studentList[num_students].last_name = "Bond";
studentList[num_students].gpa = 4.0;
num_students += 1;
}
You can avoid some of that bookkeeping by using a std::vector:
std::vector <Student> studentList;
studentList.emplace_back( "James", "Bond", 4.0 );
But as you requested we avoid them, we’ll stick with arrays.
Use a stream extractor function overload to read a struct from stream
The input stream is expected to have student data formatted as a semicolon-delimited record — that is: last name, semicolon, first name, semicolon, gpa, newline.
std::istream & operator >> ( std::istream & ins, Student & student ) {
ins >> std::ws; // skip any leading whitespace
getline( ins, student.last_name, ';' ); // read last_name & eat delimiter
getline( ins, student.first_name, ';' ); // read first_name & eat delimiter
ins >> student.gpa; // read gpa. Does not eat delimiters
ins >> std::ws; // skip all trailing whitespace (including newline)
return ins;
}
Notice how std::getline() was put to use here to read strings terminating with a semicolon. Everything else must be either:
read as a string then converted to the desired type, or
read using the >> operator and have the delimiter specifically read.
For example, if the GPA were not last in our list, we would have to read and discard (“eat”) a semicolon:
char c;
ins >> student.gpa >> c;
if (c != ';') ins.setstate( std::ios::failbit );
Yes, that is kind of long and obnoxious. But it is how C++ streams work.
Fortunately with our current Student structure, we can eat that trailing newline along with all other whitespace.
Now we can easily read a list of students until the stream indicates EOF (or any error):
while (f >> studentList[num_students]) {
num_students += 1;
if (num_students == MAX_STUDENTS) break; // don’t forget to watch your bounds!
}
Use a stream insertion function overload to write
’Nuff said.
std::ostream & operator << ( std::ostream & outs, const Student & student ) {
return outs
<< student.last_name << ";"
<< student.first_name << ";"
<< std::fixed << std::setprecision(1) << student.gpa << "\n";
}
I am personally disinclined to modify stream characteristics on argument streams, and would instead use an intermediary std::ostreamstream:
std::ostringstream oss;
oss << std::fixed << std::setprecision(1) << student.gpa;
outs << oss.str() << "\n";
But that is beyond the usual examples, and is often unnecessary. Know your data.
Either way you can now write the list of students with a simple << in a loop:
for (int n = 0; n < num_students; n++)
f << studentList[n];
Use streams with C++ idioms
You are typing too much. Use C++’s object storage model to your advantage. Curly braces (for compound statements) help tremendously.
While you are at it, name your input files as descriptively as you are allowed.
{
std::ifstream f( "students.txt" );
while (f >> studentList[num_students])
if (++num_students == MAX_STUDENTS)
break;
}
No students will be read if f does not open. Reading will stop once you run out of students (or some error occurs) or you run out of space in the array, whichever comes first. And the file is automatically closed and the f object is destroyed when we hit that final closing brace, which terminates the lexical context containing it.
Include only required headers
Finally, try to include only those headers you actually use. This is something of an acquired skill, alas. It helps when you are beginning to list those things you are including them for right alongside the directive.
Putting it all together into a working example
#include <algorithm> // std::sort
#include <fstream> // std::ifstream
#include <iomanip> // std::setprecision
#include <iostream> // std::cin, std::cout, etc
#include <string> // std::string
struct Student {
std::string last_name;
std::string first_name;
double gpa;
};
std::istream & operator >> ( std::istream & ins, Student & student ) {
ins >> std::ws; // skip any leading whitespace
getline( ins, student.last_name, ';' ); // read last_name & eat delimiter
getline( ins, student.first_name, ';' ); // read first_name & eat delimiter
ins >> student.gpa; // read gpa. Does not eat delimiters
ins >> std::ws; // skip all trailing whitespace (including newline)
return ins;
}
std::ostream & operator << ( std::ostream & outs, const Student & student ) {
return outs
<< student.last_name << ";"
<< student.first_name << ";"
<< std::fixed << std::setprecision(1) << student.gpa << "\n";
}
int main() {
constexpr int MAX_STUDENTS = 100;
Student studentList[MAX_STUDENTS];
int num_students = 0;
// Read students from file
std::ifstream f( "students.txt" );
while (f >> studentList[num_students])
if (++num_students == MAX_STUDENTS)
break;
// Sort students by GPA from lowest to highest
std::sort( studentList, studentList+num_students,
[]( auto a, auto b ) { return a.gpa < b.gpa; } );
// Print students
for(int i = 0; i < num_students; i++) {
std::cout << studentList[i];
}
}
The “students.txt” file contains:
Blackfoot;Lawrence;3.7
Chén;Junfeng;3.8
Gupta;Chaya;4.0
Martin;Anita;3.6
Running the program produces the output:
Martin;Anita;3.6
Blackfoot;Lawrence;3.7
Chén;Junfeng;3.8
Gupta;Chaya;4.0
You can, of course, print the students any way you wish. This example just prints them with the same semicolon-delimited-format as they were input. Here we print them with GPA and surname only:
for (int n = 0; n < num_students; n++)
std::cout << studentList[n].gpa << ": " << studentList[n].last_name << "\n";
Every language has its own idiomatic usage which you should learn to take advantage of.

Reading a CSV file detecting last field in file

I'm trying to read a CSV file and I have three fields that I'm supposed to read in and the very last field is an integer and I am crashing on the last line of the file with stoi function since there is no newline character and I am not sure how to detect when I am on the last line. The first two getline statements are reading the first two fields and my third getline is reading and expecting an integer and my delimiter for that one only is '\n' but this will not work for the very last line of input and I was wondering was there any workaround for this?
My field types that I am expecting are [ int, string, int ] and I have to include spaces with the middle field so I don't think using stringstream for that will be effective
while (! movieReader.eof() ) { // while we haven't readched end of file
stringstream ss;
getline(movieReader, buffer, ','); // get movie id and convert it to integer
ss << buffer; // converting id from string to integer
ss >> movieid;
getline(movieReader, movieName, ','); // get movie name
getline(movieReader, buffer, '\n');
pubYear = stoi(buffer); // buffer will be an integer, the publish year
auto it = analyze.getMovies().emplace(movieid, Movie(movieid, movieName, pubYear ) );
countMovies++;
}
For reading and writing objects one would idomatically overload the stream extraction and stream insertion operators:
Sample csv:
1, The Godfather, 1972
2, The Shawshank Redemption, 1994
3, Schindler's List, 1993
4, Raging Bull, 1980
5, Citizen Kane, 1941
Code:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
void skip_to(std::istream &is, char delim) // helper function to skip the rest of a line
{
char ch;
while ((ch = is.get()) && is && ch != delim);
}
std::istream& eat_whitespace(std::istream &is) // stream manipulator that eats up whitespace
{
char ch;
while ((ch = is.peek()) && is && std::isspace(static_cast<int>(ch)))
is.get();
return is;
}
class Movie
{
int movieid;
std::string movieName;
int pubYear;
friend std::istream& operator>>(std::istream &is, Movie &movie)
{
Movie temp; // use a temporary to not mess up movie with a half-
char ch; // extracted dataset if we fail to extract some field.
if (!(is >> temp.movieid)) // try to extract the movieid
return is;
if (!(is >> std::skipws >> ch) || ch != ',') { // read the next non white char
is.setf(std::ios::failbit); // and check its a comma
return is;
}
is >> eat_whitespace; // skip all whitespace before the movieName
if (!std::getline(is, temp.movieName, ',')) { // read the movieName up to the
return is; // next comma
}
if (!(is >> temp.pubYear)) // extract the pubYear
return is;
skip_to(is, '\n'); // skip the rest of the line (or till eof())
is.clear();
movie = temp; // all went well, assign the temporary
return is;
}
friend std::ostream& operator<<(std::ostream &os, Movie const &movie)
{
os << "Nr. " << movie.movieid << ": \"" << movie.movieName << "\" (" << movie.pubYear << ')';
return os;
}
};
int main()
{
char const * movies_file_name{ "foo.txt" };
std::ifstream is{ movies_file_name };
if (!is.is_open()) {
std::cerr << "Couldn't open \"" << movies_file_name << "\"!\n\n";
return EXIT_FAILURE;
}
std::vector<Movie> movies{ std::istream_iterator<Movie>{is},
std::istream_iterator<Movie>{} };
for (auto const & m : movies)
std::cout << m << '\n';
}
Output:
Nr. 1: "The Godfather" (1972)
Nr. 2: "The Shawshank Redemption" (1994)
Nr. 3: "Schindler's List" (1993)
Nr. 4: "Raging Bull" (1980)
Nr. 5: "Citizen Kane" (1941)

Extraction operator overloading to read from a file stream with multiple data types [duplicate]

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;
Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo
A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.
What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.
Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}

Understanding reading txt files in c++

I am trying to understand reading different txt file formats in c++
I am currently trying to read a file formatted like this,
val1 val2 val3
val1 val2 val3
val1 val2 val3
When I read the file in and then cout its contents I only get the first line then a random 0 0 at the end.
I want to save each value into its own variable in a struct.
I am doing this like this,
struct Input{
std::string group;
float total_pay;
unsigned int quantity;
Input(std::string const& groupIn, float const& total_payIn, unsigned int const& quantityIn):
group(groupIn),
total_pay(total_payIn),
quantity(quantityIn)
{}
};
int main(){
std::ifstream infile("input.txt");
std::vector<Input> data;
std::string group;
std::string total_pay;
std::string quantity;
std::getline(infile,group);
std::getline(infile,total_pay);
std::getline(infile,quantity);
while(infile) {
data.push_back(Input(group,atof(total_pay.c_str()),atoi(quantity.c_str())));
std::getline(infile,group);
std::getline(infile,total_pay);
std::getline(infile,quantity);
}
//output
for(Input values : data) {
std::cout << values.group << " " << values.total_pay << " " << values.quantity << '\n';
}
return 0;
}
What is the proper way to read this file in the the format I have specified? Do I need to specify to go to the next line after the third value?
Or should this be taking each value and putting them in to the right variable?
std::getline(infile,group);
std::getline(infile,total_pay);
std::getline(infile,quantity);
Your input processing has a number of issues. Your prevalent usage of std::getline in places where it is not needed isn't helping.
In short, per-line validation of input is generally done with a model similar to the following. Note that this requires the class provide a default constructor. We use an input-string-stream to process a single item from each line of input from the input file. If it was certain there was at-most one per line, we could forego the per-line processing, but it is a potential place for errors, so better safe than sorry. The mantra presented here is commonly used for per-line input validation when reading a stream of objects from a formatted input file, one item per line.
The following code defines the structure as you have it with a few extra pieces, including providing both an input and output stream insertion operator. The result makes the code in main() much more manageable.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
struct Input
{
// friends not needed if the members are public, but provided here
// in case you ever do make them protected or private (which you should)
friend std::istream& operator >>(std::istream& inp, Input& item);
friend std::ostream& operator <<(std::ostream& outp, Input const& item);
std::string group;
float total_pay;
unsigned int quantity;
// default constructor. sets up zero-elements
Input() : total_pay(), quantity()
{
}
Input(std::string groupIn, float total_payIn, unsigned int quantityIn)
: group(std::move(groupIn))
, total_pay(total_payIn)
, quantity(quantityIn)
{
}
// you really should be using these for accessors
std::string const& getGroup() const { return group; }
float getTotalPay() const { return total_pay; }
unsigned int getQuantity() const { return quantity; }
};
// global free function for extracting an Input item from an input stream
std::istream& operator >>(std::istream& inp, Input& item)
{
return (inp >> item.group >> item.total_pay >> item.quantity);
}
// global operator for inserting to a stream
std::ostream& operator <<(std::ostream& outp, Input const& item)
{
outp << item.getGroup() << ' '
<< item.getTotalPay() << ' '
<< item.getQuantity();
return outp;
}
int main()
{
std::ifstream infile("input.txt");
if (!infile)
{
std::cerr << "Failed to open input file" << '\n';
exit(EXIT_FAILURE);
}
// one line per item enforced.
std::vector<Input> data;
std::string line;
while (std::getline(infile, line))
{
std::istringstream iss(line);
Input inp;
if (iss >> inp) // calls our extaction operator >>
data.emplace_back(inp);
else
std::cerr << "Invalid input line: " << line << '\n';
}
// dump all of them to stdout. calls our insertion operator <<
std::copy(data.begin(), data.end(),
std::ostream_iterator<Input>(std::cout,"\n"));
return 0;
}
Provided the input is properly formatted, values like this:
group total quantity
group total quantity
will parse successfully. Conversely, if this happens:
group total quantity
group quantity
group total quantity
total quantity
the extractions of the second and fourth items will fail, and appropriate warning will be issued on std::cerr. This is the reason for using the std::istringstream intermediate stream object wrapping extraction of a single line per item.
Best of luck, and I hope it helps you out.
Check this solution
It is without error checks but with conversion to types
#include<iostream>
#include<sstream>
using namespace std;
int main()
{
string line="v1 2.2 3";//lets say you read a line to this var...
string group;
float total_pay;
unsigned int quantity;
//we split the line to the 3 fields
istringstream s(line);
s>>group>>total_pay>>quantity;
//print for test
cout<<group<<endl<<total_pay<<endl<<quantity<<endl;
return 0;
}

Parsing a huge complicated CSV file using C++

I have a large CSV file which looks like this:
23456, The End is Near, A silly description that makes no sense, http://www.example.com, 45332, 5th July 1998 Sunday, 45.332
That's just one line of the CSV file. There are around 500k of these.
I want to parse this file using C++. The code I started out with is:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
// open the input csv file containing training data
ifstream inputFile("my.csv");
string line;
while (getline(inputFile, line, ','))
{
istringstream ss(line);
// declaring appropriate variables present in csv file
long unsigned id;
string url, title, description, datetaken;
float val1, val2;
ss >> id >> url >> title >> datetaken >> description >> val1 >> val2;
cout << url << endl;
}
inputFile.close();
}
The problem is that it's not printing out the correct values.
I suspect that it's not able to handle white spaces within a field. So what do you suggest I should do?
Thanks
In this example we have to parse the string using two getline. The first gets a line of cvs text getline(cin, line) useing default newline delimiter. The second getline(ss, line, ',') delimits using commas to separates the strings.
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
float get_float(const std::string& s) {
std::stringstream ss(s);
float ret;
ss >> ret;
return ret;
}
int get_int(const std::string& s) {
std::stringstream ss(s);
int ret;
ss >> ret;
return ret;
}
int main() {
std::string line;
while (getline(cin, line)) {
std::stringstream ss(line);
std::vector<std::string> v;
std::string field;
while(getline(ss, field, ',')) {
std::cout << " " << field;
v.push_back(field);
}
int id = get_int(v[0]);
float f = get_float(v[6]);
std::cout << v[3] << std::endl;
}
}
Using std::istream to read std::strings using the overloaded insertion operator is not going to work well. The entire line is a string, so it won't pick up that there is a change in fields by default. A quick fix would be to split the line on commas and assign the values to the appropriate fields (instead of using std::istringstream).
NOTE: That is in addition to jrok's point about std::getline
Within the stated constraints, I think I'd do something like this:
#include <locale>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
// A ctype that classifies only comma and new-line as "white space":
struct field_reader : std::ctype<char> {
field_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(table_size, std::ctype_base::mask());
rc[','] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
// A struct to hold one record from the file:
struct record {
std::string key, name, desc, url, zip, date, number;
friend std::istream &operator>>(std::istream &is, record &r) {
return is >> r.key >> r.name >> r.desc >> r.url >> r.zip >> r.date >> r.number;
}
friend std::ostream &operator<<(std::ostream &os, record const &r) {
return os << "key: " << r.key
<< "\nname: " << r.name
<< "\ndesc: " << r.desc
<< "\nurl: " << r.url
<< "\nzip: " << r.zip
<< "\ndate: " << r.date
<< "\nnumber: " << r.number;
}
};
int main() {
std::stringstream input("23456, The End is Near, A silly description that makes no sense, http://www.example.com, 45332, 5th July 1998 Sunday, 45.332");
// use our ctype facet with the stream:
input.imbue(std::locale(std::locale(), new field_reader()));
// read in all our records:
std::istream_iterator<record> in(input), end;
std::vector<record> records{ in, end };
// show what we read:
std::copy(records.begin(), records.end(),
std::ostream_iterator<record>(std::cout, "\n"));
}
This is, beyond a doubt, longer than most of the others -- but it's all broken into small, mostly-reusable pieces. Once you have the other pieces in place, the code to read the data is trivial:
std::vector<record> records{ in, end };
One other point I find compelling: the first time the code compiled, it also ran correctly (and I find that quite routine for this style of programming).
I have just worked out this problem for myself and am willing to share! It may be a little overkill but it shows a working example of how Boost Tokenizer & vectors handle a big problem.
/*
* ALfred Haines Copyleft 2013
* convert csv to sql file
* csv2sql requires that each line is a unique record
*
* This example of file read and the Boost tokenizer
*
* In the spirit of COBOL I do not output until the end
* when all the print lines are ouput at once
* Special thanks to SBHacker for the code to handle linefeeds
*/
#include <sstream>
#include <boost/tokenizer.hpp>
#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/algorithm/string/replace.hpp>
#include <vector>
namespace io = boost::iostreams;
using boost::tokenizer;
using boost::escaped_list_separator;
typedef tokenizer<escaped_list_separator<char> > so_tokenizer;
using namespace std;
using namespace boost;
vector<string> parser( string );
int main()
{
vector<string> stuff ; // this is the data in a vector
string filename; // this is the input file
string c = ""; // this holds the print line
string sr ;
cout << "Enter filename: " ;
cin >> filename;
//filename = "drwho.csv";
int lastindex = filename.find_last_of("."); // find where the extension begins
string rawname = filename.substr(0, lastindex); // extract the raw name
stuff = parser( filename ); // this gets the data from the file
/** I ask if the user wants a new_index to be created */
cout << "\n\nMySql requires a unique ID field as a Primary Key \n" ;
cout << "If the first field is not unique (no dupicate entries) \nthan you should create a " ;
cout << "New index field for this data.\n" ;
cout << "Not Sure! try no first to maintain data integrity.\n" ;
string ni ;bool invalid_data = true;bool new_index = false ;
do {
cout<<"Should I create a New Index now? (y/n)"<<endl;
cin>>ni;
if ( ni == "y" || ni == "n" ) { invalid_data =false ; }
} while (invalid_data);
cout << "\n" ;
if (ni == "y" )
{
new_index = true ;
sr = rawname.c_str() ; sr.append("_id" ); // new_index field
}
// now make the sql code from the vector stuff
// Create table section
c.append("DROP TABLE IF EXISTS `");
c.append(rawname.c_str() );
c.append("`;");
c.append("\nCREATE TABLE IF NOT EXISTS `");
c.append(rawname.c_str() );
c.append( "` (");
c.append("\n");
if (new_index)
{
c.append( "`");
c.append(sr );
c.append( "` int(10) unsigned NOT NULL,");
c.append("\n");
}
string s = stuff[0];// it is assumed that line zero has fieldnames
int x =0 ; // used to determine if new index is printed
// boost tokenizer code from the Boost website -- tok holds the token
so_tokenizer tok(s, escaped_list_separator<char>('\\', ',', '\"'));
for(so_tokenizer::iterator beg=tok.begin(); beg!=tok.end(); ++beg)
{
x++; // keeps number of fields for later use to eliminate the comma on the last entry
if (x == 1 && new_index == false ) sr = static_cast<string> (*beg) ;
c.append( "`" );
c.append(*beg);
if (x == 1 && new_index == false )
{
c.append( "` int(10) unsigned NOT NULL,");
}
else
{
c.append("` text ,");
}
c.append("\n");
}
c.append("PRIMARY KEY (`");
c.append(sr );
c.append("`)" );
c.append("\n");
c.append( ") ENGINE=InnoDB DEFAULT CHARSET=latin1;");
c.append("\n");
c.append("\n");
// The Create table section is done
// Now make the Insert lines one per line is safer in case you need to split the sql file
for (int w=1; w < stuff.size(); ++w)
{
c.append("INSERT INTO `");
c.append(rawname.c_str() );
c.append("` VALUES ( ");
if (new_index)
{
string String = static_cast<ostringstream*>( &(ostringstream() << w) )->str();
c.append(String);
c.append(" , ");
}
int p = 1 ; // used to eliminate the comma on the last entry
// tokenizer code needs unique name -- stok holds this token
so_tokenizer stok(stuff[w], escaped_list_separator<char>('\\', ',', '\"'));
for(so_tokenizer::iterator beg=stok.begin(); beg!=stok.end(); ++beg)
{
c.append(" '");
string str = static_cast<string> (*beg) ;
boost::replace_all(str, "'", "\\'");
// boost::replace_all(str, "\n", " -- ");
c.append( str);
c.append("' ");
if ( p < x ) c.append(",") ;// we dont want a comma on the last entry
p++ ;
}
c.append( ");\n");
}
// now print the whole thing to an output file
string out_file = rawname.c_str() ;
out_file.append(".sql");
io::stream_buffer<io::file_sink> buf(out_file);
std::ostream out(&buf);
out << c ;
// let the user know that they are done
cout<< "Well if you got here then the data should be in the file " << out_file << "\n" ;
return 0;}
vector<string> parser( string filename )
{
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
escaped_list_separator<char> sep('\\', ',', '\"');
vector<string> stuff ;
string data(filename);
ifstream in(filename.c_str());
string li;
string buffer;
bool inside_quotes(false);
size_t last_quote(0);
while (getline(in,buffer))
{
// --- deal with line breaks in quoted strings
last_quote = buffer.find_first_of('"');
while (last_quote != string::npos)
{
inside_quotes = !inside_quotes;
last_quote = buffer.find_first_of('"',last_quote+1);
}
li.append(buffer);
if (inside_quotes)
{
li.append("\n");
continue;
}
// ---
stuff.push_back(li);
li.clear(); // clear here, next check could fail
}
in.close();
//cout << stuff.size() << endl ;
return stuff ;
}
You are right to suspect that your code is not behaving as desired because the whitespace within the field values.
If you indeed have "simple" CSV where no field may contain a comma within the field value, then I would step away from the stream operators and perhaps C++ all together. The example program in the question merely re-orders fields. There is no need to actually interpret or convert the values into their appropriate types (unless validation was also a goal). Reordering alone is super easy to accomplish with awk. For example, the following command would reverse 3 fields found in a simple CSV file.
cat infile | awk -F, '{ print $3","$2","$1 }' > outfile
If the goal is really to use this code snippet as a launchpad for bigger and better ideas ... then I would tokenize the line by searching for commas. The std::string class has a built-in method to find the offsets specific characters. You can make this approach as elegant or inelegant as you want. The most elegant approaches end up looking something like the boost tokenization code.
The quick-and-dirty approach is to just to know your program has N fields and look for the positions of the corresponding N-1 commas. Once you have those positions, it is pretty straightforward to invoke std::string::substr to extract the fields of interest.