Reading a CSV file detecting last field in file - c++

I'm trying to read a CSV file and I have three fields that I'm supposed to read in and the very last field is an integer and I am crashing on the last line of the file with stoi function since there is no newline character and I am not sure how to detect when I am on the last line. The first two getline statements are reading the first two fields and my third getline is reading and expecting an integer and my delimiter for that one only is '\n' but this will not work for the very last line of input and I was wondering was there any workaround for this?
My field types that I am expecting are [ int, string, int ] and I have to include spaces with the middle field so I don't think using stringstream for that will be effective
while (! movieReader.eof() ) { // while we haven't readched end of file
stringstream ss;
getline(movieReader, buffer, ','); // get movie id and convert it to integer
ss << buffer; // converting id from string to integer
ss >> movieid;
getline(movieReader, movieName, ','); // get movie name
getline(movieReader, buffer, '\n');
pubYear = stoi(buffer); // buffer will be an integer, the publish year
auto it = analyze.getMovies().emplace(movieid, Movie(movieid, movieName, pubYear ) );
countMovies++;
}

For reading and writing objects one would idomatically overload the stream extraction and stream insertion operators:
Sample csv:
1, The Godfather, 1972
2, The Shawshank Redemption, 1994
3, Schindler's List, 1993
4, Raging Bull, 1980
5, Citizen Kane, 1941
Code:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
void skip_to(std::istream &is, char delim) // helper function to skip the rest of a line
{
char ch;
while ((ch = is.get()) && is && ch != delim);
}
std::istream& eat_whitespace(std::istream &is) // stream manipulator that eats up whitespace
{
char ch;
while ((ch = is.peek()) && is && std::isspace(static_cast<int>(ch)))
is.get();
return is;
}
class Movie
{
int movieid;
std::string movieName;
int pubYear;
friend std::istream& operator>>(std::istream &is, Movie &movie)
{
Movie temp; // use a temporary to not mess up movie with a half-
char ch; // extracted dataset if we fail to extract some field.
if (!(is >> temp.movieid)) // try to extract the movieid
return is;
if (!(is >> std::skipws >> ch) || ch != ',') { // read the next non white char
is.setf(std::ios::failbit); // and check its a comma
return is;
}
is >> eat_whitespace; // skip all whitespace before the movieName
if (!std::getline(is, temp.movieName, ',')) { // read the movieName up to the
return is; // next comma
}
if (!(is >> temp.pubYear)) // extract the pubYear
return is;
skip_to(is, '\n'); // skip the rest of the line (or till eof())
is.clear();
movie = temp; // all went well, assign the temporary
return is;
}
friend std::ostream& operator<<(std::ostream &os, Movie const &movie)
{
os << "Nr. " << movie.movieid << ": \"" << movie.movieName << "\" (" << movie.pubYear << ')';
return os;
}
};
int main()
{
char const * movies_file_name{ "foo.txt" };
std::ifstream is{ movies_file_name };
if (!is.is_open()) {
std::cerr << "Couldn't open \"" << movies_file_name << "\"!\n\n";
return EXIT_FAILURE;
}
std::vector<Movie> movies{ std::istream_iterator<Movie>{is},
std::istream_iterator<Movie>{} };
for (auto const & m : movies)
std::cout << m << '\n';
}
Output:
Nr. 1: "The Godfather" (1972)
Nr. 2: "The Shawshank Redemption" (1994)
Nr. 3: "Schindler's List" (1993)
Nr. 4: "Raging Bull" (1980)
Nr. 5: "Citizen Kane" (1941)

Related

Parsing text file lines in C++

I have a txt file with data such as following:
regNumber FName Score1 Score2 Score3
385234 John Snow 90.0 56.0 60.8
38345234 Michael Bolton 30.0 26.5
38500234 Tim Cook 40.0 56.5 20.2
1547234 Admin__One 10.0
...
The data is separated only by whitespace.
Now, my issue is that as some of the data is missing, I cannot simply do as following:
ifstream file;
file.open("file.txt")
file >> regNo >> fName >> lName >> score1 >> score2 >> score3
(I'm not sure if code above is right, but trying to explain the idea)
What I want to do is roughly this:
cout << "Reg Number: ";
cin >> regNo;
cout << "Name: ";
cin >> name;
if(regNo == regNumber && name == fname) {
cout << "Access granted" << endl;
}
This is what I've tried/where I'm at:
ifstream file;
file.open("users.txt");
string line;
while(getline(file, line)) {
stringstream ss(line);
string word;
while(ss >> word) {
cout << word << "\t";
}
cout << " " << endl;
}
I can output the file entirely, my issue is when it comes to picking the parts, e.g. only getting the regNumber or the name.
I would read the whole line in at once and then just substring it (since you suggest that these are fixed width fields)
Handling the spaces between the words of the names are tricky, but its apparent from your file that each column starts at a fixed offset. You can use this to extract the information you want. For example, in order to read the names, you can read the line starting at the offset that FName starts, and ending at the offset that Score1 starts. Then you can remove trailing white spaces from the string like this:
string A = "Tim Cook ";
auto index = A.find_last_not_of(' ');
A.erase(index + 1);
Alright, I can’t sleep and so decided to go bonkers and demonstrate just how tricky input is, especially when you have freeform data. The following code contains plenty of commentary on reading freeform data that may be missing.
#include <ciso646>
#include <deque>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <optional>
#include <sstream>
#include <string>
#include <type_traits>
#include <vector>
// Useful Stuff
template <typename T> T& lvalue( T&& arg ) { return arg; }
using strings = std::deque <std::string> ;
auto split( const std::string& s )
{
return strings
(
std::istream_iterator <std::string> ( lvalue( std::istringstream{ s } ) ),
std::istream_iterator <std::string> ()
);
}
template <typename T>
auto string_to( const std::string & s )
{
T value;
std::istringstream ss( s );
return ((ss >> value) and (ss >> std::ws).eof())
? value
: std::optional<T> { };
}
std::string trim( const std::string& s )
{
auto R = s.find_last_not_of ( " \f\n\r\t\v" ) + 1;
auto L = s.find_first_not_of( " \f\n\r\t\v" );
return s.substr( L, R-L );
}
// Each record is stored as a “User”.
// “Users” is a complete dataset of records.
struct User
{
int regNumber;
std::vector <std::string> names;
std::vector <double> scores;
};
using Users = std::vector <User> ;
// This is stuff you would put in the .cpp file, not an .hpp file.
// But since this is a single-file example, it goes here.
namespace detail::Users
{
static const char * FILE_HEADER = "regNumber FName Score1 Score2 Score3\n";
static const int REGNUMBER_WIDTH = 11;
static const int NAMES_TOTAL_WIDTH = 18;
static const int SCORE_WIDTH = 9;
static const int SCORE_PRECISION = 1;
}
// Input is always the hardest part, and provides a WHOLE lot of caveats to deal with.
// Let us take them one at a time.
//
// Each user is a record composed of ONE OR MORE elements on a line of text.
// The elements are:
// (regNumber)? (name)* (score)*
//
// The way we handle this is:
// (1) Read the entire line
// (2) Split the line into substrings
// (3) If the first element is a regNumber, grab it
// (4) Grab any trailing floating point values as scores
// (5) Anything remaining must be names
//
// There are comments in the code below which enable you to produce a hard failure
// if any record is incorrect, however you define that. A “hard fail” sets the fail
// state on the input stream, which will stop all further input on the stream until
// the caller uses the .clear() method on the stream.
//
// The default action is to stop reading records if a failure occurs. This way the
// CALLER can decide whether to clear the error and try to read more records.
//
// Finally, we use decltype liberally to make it easier to modify the User struct
// without having to watch out for type problems with the stream extraction operator.
// Input a single record
std::istream& operator >> ( std::istream& ins, User& user )
{
// // Hard fail helper (named lambda)
// auto failure = [&ins]() -> std::istream&
// {
// ins.setstate( std::ios::failbit );
// return ins;
// };
// You should generally clear your target object when writing stream extraction operators
user = User{};
// Get a single record (line) from file
std::string s;
if (!getline( ins, s )) return ins;
// Split the record into fields
auto fields = split( s );
// Skip (blank lines) and (file headers)
static const strings header = split( detail::Users::FILE_HEADER );
if (fields.empty() or fields == header) return operator >> ( ins, user );
// The optional regNumber must appear first
auto reg_number = string_to <decltype(user.regNumber)> ( fields.front() );
if (reg_number)
{
user.regNumber = *reg_number;
fields.pop_front();
}
// Optional scores must appear last
while (!fields.empty())
{
auto score = string_to <std::remove_reference <decltype(user.scores.front())> ::type> ( fields.back() );
if (!score) break;
user.scores.insert( user.scores.begin(), *score );
fields.pop_back();
}
// if (user.scores.size() > 3) return failure(); // is there a maximum number of scores?
// Any remaining fields are names.
// if (fields.empty()) return failure(); // at least one name required?
// if (fields.size() > 2) return failure(); // maximum of two names?
for (const auto& name : fields)
{
// (You could also check that each name matches a valid regex pattern, etc)
user.names.push_back( name );
}
// If we got this far, all is good. Return the input stream.
return ins;
}
// Input a complete User dataset
std::istream& operator >> ( std::istream& ins, Users& users )
{
// This time, do NOT clear the target object! This permits the caller to read
// multiple files and combine them! The caller is also now responsible to
// provide a new/empty/clear target Users object to avoid combining datasets.
// Read all records
User user;
while (ins >> user) users.push_back( user );
// Return the input stream
return ins;
}
// Output, by comparison, is fabulously easy.
//
// I won’t bother to explain any of this, except to recall that
// the User is stored as a line-object record -- that is, it must
// be terminated by a newline. Hence we output the newline in the
// single User stream insertion operator (output operator) instead
// of the Users output operator.
// Output a single User record
std::ostream& operator << ( std::ostream& outs, const User& user )
{
std::ostringstream userstring;
userstring << std::setw( detail::Users::REGNUMBER_WIDTH ) << std::left << user.regNumber;
std::ostringstream names;
for (const auto& name : user.names) names << name << " ";
userstring << std::setw( detail::Users::NAMES_TOTAL_WIDTH ) << std::left << names.str();
for (auto score : user.scores)
userstring
<< std::left << std::setw( detail::Users::SCORE_WIDTH )
<< std::fixed << std::setprecision( detail::Users::SCORE_PRECISION )
<< score;
return outs << trim( userstring.str() ) << "\n"; // <-- output of newline
}
// Output a complete User dataset
std::ostream& operator << ( std::ostream& outs, const Users& users )
{
outs << detail::Users::FILE_HEADER;
for (const auto& user : users) outs << user;
return outs;
}
int main()
{
// Example input. Notice that any field may be absent.
std::istringstream input(
"regNumber FName Score1 Score2 Score3 \n"
"385234 John Snow 90.0 56.0 60.8 \n"
"38345234 Michael Bolton 30.0 26.5 \n"
"38500234 Tim Cook 40.0 56.5 20.2 \n"
"1547234 Admin__One 10.0 \n"
" \n" // blank line --> skipped
" Jon Bon Jovi \n"
"11111 22.2 \n"
" 33.3 \n"
"4444 \n"
"55 Justin Johnson \n"
);
Users users;
input >> users;
std::cout << users;
}
To compile with MSVC:
cl /EHsc /W4 /Ox /std:c++17 a.cpp
To compile with Clang:
clang++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 a.cpp
To compile with MinGW/GCC/etc use the same as Clang, substituting g++ for clang++, naturally.
As a final note, if you can make your data file much more strict life will be significantly easier. For example, if you can say that you are always going to used fixed-width fields you can use Shahriar’s answer, for example, or pm100’s answer, which I upvoted.
I would define a Person class.
This knows how to read and write a Person on one line.
class Person
{
int regNumber;
std::string FName;
std::array<float,3> scope;
friend std::ostream& operator<<(std::ostream& s, Person const& p)
{
return p << regNumber << " " << FName << " " << scope[0] << " " << scope[1] << " " << scope[2] << "\n";
}
friend std::istream& operator>>(std::istream& s, Person& p)
{
std::string line;
std::getline(s, line);
bool valid = true;
Person tmp; // Holds value while we check
// Handle Line.
// Handle missing data.
// And update tmp to the correct state.
if (valid) {
// The conversion worked.
// So update the object we are reading into.
swap(p, tmp);
}
else {
// The conversion failed.
// Set the stream to bad so we stop reading.
s.setstate(std::ios::bad);
}
return s;
}
void swap(Person& other) noexcept
{
using std::swap;
swap(regNumber, other.regNumber);
swap(FName, other.FName);
swap(scope, other.scope);
}
};
Then your main becomes much simpler.
int main()
{
std::ifstream file("Data");
Person person;
while (file >> person)
{
std::cout << person;
}
}
It also becomes easier to handle your second part.
You load each person then ask the Person object to validate that credentials.
class Person
{
// STUFF From before:
public:
bool validateUser(int id, std::string const& name) const
{
return id == regNumber && name == FName;
}
};
int main()
{
int reg = getUserReg();
std::string name = getUserName();
std::ifstream file("Data");
Person person;
while (file >> person)
{
if (person.validateUser(reg, name))
{
std::cout << "Access Granted\n";
}
}
}

Extraction operator overloading to read from a file stream with multiple data types [duplicate]

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;
Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo
A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.
What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.
Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}

How to read .txt file with condition [duplicate]

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;
Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo
A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.
What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.
Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}

C++ Read file line by line then split each line using the delimiter

i have searched and got a good grasp of the solution from a previous post but I am still stuck a bit.
My problem is instead of "randomstring TAB number TAB number NL"
My data is "number (space colon space) number (space colon space) sentence"
I've edited the code below but still can't get it to work 100% because the parameters getline takes is (stream, string, delimiter).
For some reason, it only gets the first word of the sentence as well and not the rest.
Previous post
I want to read a txt file line by line and after reading each line, I want to split the line according to the tab "\t" and add each part to an element in a struct.
my struct is 1*char and 2*int
struct myStruct
{
char chr;
int v1;
int v2;
}
where chr can contain more than one character.
A line should be something like:
randomstring TAB number TAB number NL
SOLUTION
std::ifstream file("plop");
std::string line;
while(std::getline(file, line))
{
std::stringstream linestream(line);
std::string data;
int val1;
int val2;
// If you have truly tab delimited data use getline() with third parameter.
// If your data is just white space separated data
// then the operator >> will do (it reads a space separated word into a string).
std::getline(linestream, data, '\t'); // read up-to the first tab (discard tab).
// Read the integers using the operator >>
linestream >> val1 >> val2;
}
At the following code line, the data variable will hold the complete line. And linestream will be consumed, so further readings will not yield anything.
std::getline(linestream, data, '\t'); // read up-to the first tab (discard tab).
Instead, you can just work on the line like this
while (std::getline(file, line))
{
int token1 = std::stoi(line.substr(0, line.find(" : ")));
line.erase(0, line.find(" : ") + 3);
int token2 = std::stoi(line.substr(0, line.find(" : ")));
line.erase(0, line.find(" : ") + 3);
std::string token3 = line;
}
What exactly is your problem ?
//Title of this code
//clang 3.4
#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>
#include <sstream>
#include <vector>
struct Data
{
int n1;
int n2;
std::string sequence;
};
std::ostream& operator<<(std::ostream& ostr, const Data& data)
{
ostr << "{" << data.n1 << "," << data.n2 << ",\"" << data.sequence << "\"}";
return ostr;
}
std::string& ltrim(std::string& s, const char* t = " \t")
{
s.erase(0, s.find_first_not_of(t));
return s;
}
std::string& rtrim(std::string& s, const char* t = " \t")
{
s.erase(s.find_last_not_of(t) + 1);
return s;
}
std::string& trim(std::string& s, const char* t = " \t")
{
return ltrim(rtrim(s, t), t);
}
int main()
{
std::string file_content{
"1\t1\t\n"
"2\t2\tsecond sequence\t\n"
"3\t3\tthird sequence\n"};
std::istringstream file_stream{file_content};
std::string line;
std::vector<Data> content;
while(std::getline(file_stream, line))
{
std::istringstream line_stream{line};
Data data{};
if(!(line_stream >> data.n1 >> data.n2))
{
std::cout << "Failed to parse line (numbers):\n" << line << "\n";
break;
}
auto numbers_end = line_stream.tellg();
if(numbers_end == -1)
{
std::cout << "Failed to parse line (sequence):\n" << line << "\n";
break;
}
data.sequence = line.substr(numbers_end);
trim(data.sequence);
content.push_back(std::move(data));
}
std::copy(content.cbegin(), content.cend(),
std::ostream_iterator<Data>(std::cout, "\n"));
}
Live
Live with colons

Why does reading a record struct fields from std::istream fail, and how can I fix it?

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;
Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo
A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.
What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.
Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}