How to read .txt file with condition [duplicate] - c++

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!

One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}

You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;

Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo

A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}

Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}

Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.

What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.

Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}

When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}

Related

Extraction operator overloading to read from a file stream with multiple data types [duplicate]

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;
Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo
A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.
What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.
Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}

Why does reading a record struct fields from std::istream fail, and how can I fix it?

Suppose we have the following situation:
A record struct is declared as follows
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Records are stored in a file using the following format:
ID Forename Lastname Age
------------------------------
1267867 John Smith 32
67545 Jane Doe 36
8677453 Gwyneth Miller 56
75543 J. Ross Unusual 23
...
The file should be read in to collect an arbitrary number of the Person records mentioned above:
std::istream& ifs = std::ifstream("SampleInput.txt");
std::vector<Person> persons;
Person actRecord;
while(ifs >> actRecord.id >> actRecord.name >> actRecord.age) {
persons.push_back(actRecord);
}
if(!ifs) {
std::err << "Input format error!" << std::endl;
}
Question:
What can I do to read in the separate values storing their values into the one actRecord variables' fields?
The above code sample ends up with run time errors:
Runtime error time: 0 memory: 3476 signal:-1
stderr: Input format error!
One viable solution is to reorder input fields (if this is possible)
ID Age Forename Lastname
1267867 32 John Smith
67545 36 Jane Doe
8677453 56 Gwyneth Miller
75543 23 J. Ross Unusual
...
and read in the records as follows
#include <iostream>
#include <vector>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
unsigned int age;
while(ifs >> actRecord.id >> age &&
std::getline(ifs, actRecord.name)) {
actRecord.age = uint8_t(age);
persons.push_back(actRecord);
}
return 0;
}
You have whitespace between firstname and lastname. Change your class to have firstname and lastname as separate strings and it should work. The other thing you can do is to read in two separate variables such as name1 and name2 and assign it as
actRecord.name = name1 + " " + name2;
Here's an implementation of a manipulator I came up with that counts the delimiter through each extracted character. Using the number of delimiters you specify, it will extract words from the input stream. Here's a working demo.
template<class charT>
struct word_inserter_impl {
word_inserter_impl(std::size_t words, std::basic_string<charT>& str, charT delim)
: str_(str)
, delim_(delim)
, words_(words)
{ }
friend std::basic_istream<charT>&
operator>>(std::basic_istream<charT>& is, const word_inserter_impl<charT>& wi) {
typename std::basic_istream<charT>::sentry ok(is);
if (ok) {
std::istreambuf_iterator<charT> it(is), end;
std::back_insert_iterator<std::string> dest(wi.str_);
while (it != end && wi.words_) {
if (*it == wi.delim_ && --wi.words_ == 0) {
break;
}
dest++ = *it++;
}
}
return is;
}
private:
std::basic_string<charT>& str_;
charT delim_;
mutable std::size_t words_;
};
template<class charT=char>
word_inserter_impl<charT> word_inserter(std::size_t words, std::basic_string<charT>& str, charT delim = charT(' ')) {
return word_inserter_impl<charT>(words, str, delim);
}
Now you can just do:
while (ifs >> actRecord.id >> word_inserter(2, actRecord.name) >> actRecord.age) {
std::cout << actRecord.id << " " << actRecord.name << " " << actRecord.age << '\n';
}
Live Demo
A solution would be to read in the first entry into an ID variable.
Then read in all the other words from the line (just push them in a temporary vector) and construct the name of the individual with all the elements, except the last entry which is the Age.
This would allow you to still have the Age on the last position but be able to deal with name like "J. Ross Unusual".
Update to add some code which illustrates the theory above:
#include <memory>
#include <string>
#include <vector>
#include <iterator>
#include <fstream>
#include <sstream>
#include <iostream>
struct Person {
unsigned int id;
std::string name;
int age;
};
int main()
{
std::fstream ifs("in.txt");
std::vector<Person> persons;
std::string line;
while (std::getline(ifs, line))
{
std::istringstream iss(line);
// first: ID simply read it
Person actRecord;
iss >> actRecord.id;
// next iteration: read in everything
std::string temp;
std::vector<std::string> tempvect;
while(iss >> temp) {
tempvect.push_back(temp);
}
// then: the name, let's join the vector in a way to not to get a trailing space
// also taking care of people who do not have two names ...
int LAST = 2;
if(tempvect.size() < 2) // only the name and age are in there
{
LAST = 1;
}
std::ostringstream oss;
std::copy(tempvect.begin(), tempvect.end() - LAST,
std::ostream_iterator<std::string>(oss, " "));
// the last element
oss << *(tempvect.end() - LAST);
actRecord.name = oss.str();
// and the age
actRecord.age = std::stoi( *(tempvect.end() - 1) );
persons.push_back(actRecord);
}
for(std::vector<Person>::const_iterator it = persons.begin(); it != persons.end(); it++)
{
std::cout << it->id << ":" << it->name << ":" << it->age << std::endl;
}
}
Since we can easily split a line on whitespace and we know that the only value that can be separated is the name, a possible solution is to use a deque for each line containing the whitespace separated elements of the line. The id and the age can easily be retrieved from the deque and the remaining elements can be concatenated to retrieve the name:
#include <iostream>
#include <fstream>
#include <deque>
#include <vector>
#include <sstream>
#include <iterator>
#include <string>
#include <algorithm>
#include <utility>
struct Person {
unsigned int id;
std::string name;
uint8_t age;
};
int main(int argc, char* argv[]) {
std::ifstream ifs("SampleInput.txt");
std::vector<Person> records;
std::string line;
while (std::getline(ifs,line)) {
std::istringstream ss(line);
std::deque<std::string> info(std::istream_iterator<std::string>(ss), {});
Person record;
record.id = std::stoi(info.front()); info.pop_front();
record.age = std::stoi(info.back()); info.pop_back();
std::ostringstream name;
std::copy
( info.begin()
, info.end()
, std::ostream_iterator<std::string>(name," "));
record.name = name.str(); record.name.pop_back();
records.push_back(std::move(record));
}
for (auto& record : records) {
std::cout << record.id << " " << record.name << " "
<< static_cast<unsigned int>(record.age) << std::endl;
}
return 0;
}
Another solution is to require certain delimiter characters for a particular field, and provide a special extraction manipulator for this purpose.
Let's suppose we define the delimiter character ", and the input should look like this:
1267867 "John Smith" 32
67545 "Jane Doe" 36
8677453 "Gwyneth Miller" 56
75543 "J. Ross Unusual" 23
Generally needed includes:
#include <iostream>
#include <vector>
#include <iomanip>
The record declaration:
struct Person {
unsigned int id;
std::string name;
uint8_t age;
// ...
};
Declaration/definition of a proxy class (struct) that supports being used with the std::istream& operator>>(std::istream&, const delim_field_extractor_proxy&) global operator overload:
struct delim_field_extractor_proxy {
delim_field_extractor_proxy
( std::string& field_ref
, char delim = '"'
)
: field_ref_(field_ref), delim_(delim) {}
friend
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy);
void extract_value(std::istream& is) const {
field_ref_.clear();
char input;
bool addChars = false;
while(is) {
is.get(input);
if(is.eof()) {
break;
}
if(input == delim_) {
addChars = !addChars;
if(!addChars) {
break;
}
else {
continue;
}
}
if(addChars) {
field_ref_ += input;
}
}
// consume whitespaces
while(std::isspace(is.peek())) {
is.get();
}
}
std::string& field_ref_;
char delim_;
};
std::istream& operator>>
( std::istream& is
, const delim_field_extractor_proxy& extractor_proxy) {
extractor_proxy.extract_value(is);
return is;
}
Plumbing everything connected together and instantiating the delim_field_extractor_proxy:
int main() {
std::istream& ifs = std::cin; // Open file alternatively
std::vector<Person> persons;
Person actRecord;
int act_age;
while(ifs >> actRecord.id
>> delim_field_extractor_proxy(actRecord.name,'"')
>> act_age) {
actRecord.age = uint8_t(act_age);
persons.push_back(actRecord);
}
for(auto it = persons.begin();
it != persons.end();
++it) {
std::cout << it->id << ", "
<< it->name << ", "
<< int(it->age) << std::endl;
}
return 0;
}
See the working example here.
NOTE:
This solution also works well specifying a TAB character (\t) as delimiter, which is useful parsing standard .csv formats.
What can I do to read in the separate words forming the name into the one actRecord.name variable?
The general answer is: No, you can't do this without additional delimiter specifications and exceptional parsing for the parts forming the intended actRecord.name contents.
This is because a std::string field will be parsed just up to the next occurence of a whitespace character.
It's noteworthy that some standard formats (like e.g. .csv) may require to support distinguishing blanks (' ') from tab ('\t') or other characters, to delimit certain record fields (which may not be visible at a first glance).
Also note:
To read an uint8_t value as numeric input, you'll have to deviate using a temporary unsigned intvalue. Reading just a unsigned char (aka uint8_t) will screw up the stream parsing state.
Another attempt at solving the parsing problem.
int main()
{
std::ifstream ifs("test-115.in");
std::vector<Person> persons;
while (true)
{
Person actRecord;
// Read the ID and the first part of the name.
if ( !(ifs >> actRecord.id >> actRecord.name ) )
{
break;
}
// Read the rest of the line.
std::string line;
std::getline(ifs,line);
// Pickup the rest of the name from the rest of the line.
// The last token in the rest of the line is the age.
// All other tokens are part of the name.
// The tokens can be separated by ' ' or '\t'.
size_t pos = 0;
size_t iter1 = 0;
size_t iter2 = 0;
while ( (iter1 = line.find(' ', pos)) != std::string::npos ||
(iter2 = line.find('\t', pos)) != std::string::npos )
{
size_t iter = (iter1 != std::string::npos) ? iter1 : iter2;
actRecord.name += line.substr(pos, (iter - pos + 1));
pos = iter + 1;
// Skip multiple whitespace characters.
while ( isspace(line[pos]) )
{
++pos;
}
}
// Trim the last whitespace from the name.
actRecord.name.erase(actRecord.name.size()-1);
// Extract the age.
// std::stoi returns an integer. We are assuming that
// it will be small enough to fit into an uint8_t.
actRecord.age = std::stoi(line.substr(pos).c_str());
// Debugging aid.. Make sure we have extracted the data correctly.
std::cout << "ID: " << actRecord.id
<< ", name: " << actRecord.name
<< ", age: " << (int)actRecord.age << std::endl;
persons.push_back(actRecord);
}
// If came here before the EOF was reached, there was an
// error in the input file.
if ( !(ifs.eof()) ) {
std::cerr << "Input format error!" << std::endl;
}
}
When seeing such an input file, I think it is not a (new way) delimited file, but a good old fixed size fields one, like Fortran and Cobol programmers used to deal with. So I would parse it like that (note I separated forename and lastname) :
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
struct Person {
unsigned int id;
std::string forename;
std::string lastname;
uint8_t age;
// ...
};
int main() {
std::istream& ifs = std::ifstream("file.txt");
std::vector<Person> persons;
std::string line;
int fieldsize[] = {8, 9, 9, 4};
while(std::getline(ifs, line)) {
Person person;
int field = 0, start=0, last;
std::stringstream fieldtxt;
fieldtxt.str(line.substr(start, fieldsize[0]));
fieldtxt >> person.id;
start += fieldsize[0];
person.forename=line.substr(start, fieldsize[1]);
last = person.forename.find_last_not_of(' ') + 1;
person.forename.erase(last);
start += fieldsize[1];
person.lastname=line.substr(start, fieldsize[2]);
last = person.lastname.find_last_not_of(' ') + 1;
person.lastname.erase(last);
start += fieldsize[2];
std::string a = line.substr(start, fieldsize[3]);
fieldtxt.str(line.substr(start, fieldsize[3]));
fieldtxt >> age;
person.age = person.age;
persons.push_back(person);
}
return 0;
}

C++ reading from data from text file

I have the following data:
$GPVTG,,T,,M,0.00,N,0.0,K,A*13
I need to read the data, however there are blanks in between the commas, therefore I am not sure how I should read the data.
Also, how do I select GPVTG only for a group of data? For example:
GPVTG,,T,,M
GPGGA,184945.00
GPRMC,18494
GPVTG,,T,,M,0
GPGGA,184946.000,3409
I have tried using:
/* read data line */
fgets(gpsString,100,gpsHandle);
char type[10] = "GPVTG";
sscanf(gpsString," %GPVTG", &type);
if (strcmp(gpsString, "GPTVG") == 0){
printf("%s\n",gpsString);
}
Thats what i'd do
#include <iostream>
#include <vector>
#include <sstream>
#include <fstream>
#include <string>
using namespace std;
vector<string> &split(const string &s, char delim, vector<string> &elems) {
stringstream ss(s);
string item;
while (getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
vector<string> split(const string &s, char delim) {
vector<string> elems;
split(s, delim, elems);
return elems;
}
int main()
{
ifstream ifs("file.txt");
string data_string;
while ( getline( ifs, data_string ) )
{
//i think you'd want to erase first $ charachter
if ( !data_string.empty() ) data_string.erase( data_string.begin() );
//now all data put into array:
vector<string> data_array = split ( data_string, ',' );
if ( data_array[0] == "GPVTG" )
{
//do whatever you want with that data entry
cout << data_string;
}
}
return 0;
}
Should handle your task. All empty elements will be empty "" strings in array. Ask if you need anything else.
Credits for split functions belong to Split a string in C++? answer.
How about this
#include <istream>
#include <sstream>
class CSVInputStream {
public:
CSVInputStream(std::istream& ist) : input(ist) {}
std::istream& input;
};
CSVInputStream& operator>>(CSVInputStream& in, std::string& target) {
if (!in.input) return in;
std::getline(in.input, target , ',');
return in;
}
template <typename T>
CSVInputStream& operator>>(CSVInputStream& in, T& target) {
if (!in.input) return in;
std::string line;
std::getline(in.input, line , ',');
std::stringstream translator;
translator << line;
translator >> target;
return in;
}
//--------------------------------------------------------------------
// Usage follow, perhaps in another file
//--------------------------------------------------------------------
#include <fstream>
#include <iostream>
int main() {
std::ifstream file;
file.open("testcsv.csv");
CSVInputStream input(file);
std::string sentence_type;
double track_made_good;
char code;
double unused;
double speed_kph;
char speed_unit_kph;
double speed_kmh;
char speed_unit_kmh;
input >> sentence_type >> track_made_good >> code;
input >> unused >> unused;
input >> speed_kph >> speed_unit_kph;
input >> speed_kmh >> speed_unit_kmh;
std::cout << sentence_type << " - " << track_made_good << " - ";
std::cout << speed_kmh << " " << speed_unit_kmh << " - ";
std::cout << speed_kph << " " << speed_unit_kph << std::endl;;
}
This separates the comma separation from the reading of the values, and can be reused on
most other comma separated stuff.
If you want use C++ style code based on fstream:
fin.open(input);
cout << "readed";
string str;
getline(fin, str); // read one line per time

how to extract out data from a txt file which is seperated by |

I am looking for a way to extract out data from a txt file which data is seperated by row and each column of data is seperaed by |
Here's an example
12|john bravo|123 kings street
15|marry jane|321 kings street
Previously i did it by separating using spaces like this
12 john kingstreet
15 marry kingstreet
But it poses a problem when I add a last name to the names/ add an address with spaces, ex: john bravo
So I decided to separate the column data using |
this is how I extract the data
struct PERSON{
int id;
string name;
string address;
};
//extract
int main(){
PERSON data[2];
ifstream uFile("people.txt");
int i = 0;
while(uFile >> data[i].id >> data[i].name >> data[i].address){
i++;
}
return 0;
}
So how do i extract if the columns are separated by | ??
Use getline() twice:
First, get each line use default seperator (new line); second, for each segment from first step, use '|' as seperator. "stringstream" class may be used to transfer data.
#edward The code below is modified from yours, and I think #P0W58 's answer is better.
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;
const int length = 2;
struct PERSON
{
int id;
string name;
string address;
};
//extract
int main()
{
PERSON data[length];
ifstream fin("people.txt");
int i = 0;
while(true)
{
string segment;
if (!getline(fin, segment))
break;
stringstream transporter;
transporter << segment;
string idString;
getline(transporter, idString, '|');
getline(transporter, data[i].name, '|');
getline(transporter, data[i].address, '|');
stringstream idStream;
idStream << idString;
idStream >> data[i].id;
i++;
}
for (i=0; i<length; i++)
cout << data[i].id << '+' << data[i].name << '+'\
<< data[i].address << endl;
return 0;
}
To read into a struct , I'd overload << and then parse the text as mentioned in one of the answer .
Something like this :
#include<sstream>
//...
struct PERSON{
int id;
std::string name;
std::string address;
friend std::istream& operator >>(std::istream& is, PERSON& p)
{
std::string s;
std::getline(is, s); //Read Line, use '\r' if your file is saved on linux
std::stringstream ss(s);
std::getline(ss, s, '|'); //id
p.id = std::atoi(s.c_str());
std::getline(ss, p.name, '|'); // name
std::getline(ss, p.address, '|'); //address
return is ;
}
};
And then you can probably do,
std::ifstream fin("input.txt");
PERSON p1;
while (fin >> p1)
//std::cout << p1.id << p1.name << std::endl ;
You can overload << too
Use boost::tokenizer or find first of like :
// code example
string s = "12|john bravo|123 kings street";
string delimiters = "|";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = s.find_first_of( delimiters, current );
cout << s.substr( current, next - current ) << endl;
}
while (next != string::npos);

how to manipulate the txt file using C++ STL HOmework

I have a txt file that contains name, id number, mobilenumber, and location in comma separated line.
example
Robby, 7890,7788992356, 123 westminister
tom, 8820, 77882345, 124 kingston road
My task is to retrieve
Look up all of an employee's information by name.
Look up all of an employee's information by ID.
Add the information of an employee.
Update the information of an employee.
SO far I have read the file and stored the information in a vector. Code is shown below.
For tasks
1)Look up all of an employee's information by name. I will iterate in the vector and prints information containing the name . I will be able to do that
2) simialry in text file I will look for id and prints information about that.
BUT I am clueless about point 3 & 4.
I am posting my code below
void filter_text( vector<string> *words, string name)
{
vector<string>::iterator startIt = words->begin();
vector<string>::iterator endIt = words->end();
if( !name.size() )
std::cout << " no word to found for empty string ";
while( startIt != endIt)
{
string::size_type pos = 0;
while( (pos = (*startIt).find_first_of(name, pos) ) != string::npos)
std:cout <<" the name is " << *startIt<< end;
startIt++;
}
}
int main()
{
// to read a text file
std::string file_name;
std::cout << " please enter the file name to parse" ;
std::cin >> file_name;
//open text file for input
ifstream infile(file_name.c_str(), ios::in) ;
if(! infile)
{
std::cerr <<" failed to open file\n";
exit(-1);
}
vector<string> *lines_of_text = new vector<string>;
string textline;
while(getline(infile, textline, '\n'))
{
std::cout <<" line text:" << textline <<std::endl;
lines_of_text->push_back(textline);
}
filter_text( lines_of_text, "tony");
return 0;
}
#include <string>
#include <iostream>
#include <vector>
#include <stdexcept>
#include <fstream>
struct bird {
std::string name;
int weight;
int height;
};
bird& find_bird_by_name(std::vector<bird>& birds, const std::string& name) {
for(unsigned int i=0; i<birds.size(); ++i) {
if (birds[i].name == name)
return birds[i];
}
throw std::runtime_error("BIRD NOT FOUND");
}
bird& find_bird_by_weight(std::vector<bird>& birds, int weight) {
for(unsigned int i=0; i<birds.size(); ++i) {
if (birds[i].weight< weight)
return birds[i];
}
throw std::runtime_error("BIRD NOT FOUND");
}
int main() {
std::ifstream infile("birds.txt");
char comma;
bird newbird;
std::vector<bird> birds;
//load in all the birds
while (infile >> newbird.name >> comma >> newbird.weight >> comma >> newbird.height)
birds.push_back(newbird);
//find bird by name
bird& namebird = find_bird_by_name(birds, "Crow");
std::cout << "found " << namebird.name << '\n';
//find bird by weight
bird& weightbird = find_bird_by_weight(birds, 10);
std::cout << "found " << weightbird.name << '\n';
//add a bird
std::cout << "Bird name: ";
std::cin >> newbird.name;
std::cout << "Bird weight: ";
std::cin >> newbird.weight;
std::cout << "Bird height: ";
std::cin >> newbird.height;
birds.push_back(newbird);
//update a bird
bird& editbird = find_bird_by_name(birds, "Raven");
editbird.weight = 1000000;
return 0;
}
Obviously not employees, because that would make your homework too easy.
So, first off, I don't think you should store the information in a vector of strings. This kind of task totally calls for the use of a
struct employee {
int id;
std::string name;
std::string address;
//... more info
};
And storing instances of employees in an
std::vector<employee>
You see, using your strategy of storing the lines, searching for "westminster" would net me Robbie, as his line of text does include this substring, but his name isn't westminster at all. Storing the data in a vector of employee structs would eliminate this problem, and it'd make the whole thing a lot more, well, structured.
Of course you'd need to actually parse the file to get the info into the vector. I'd suggest using a strategy like:
while(getline(infile, textline, '\n')) {
std::stringstream l(textline);
getline(l,oneEmp.name, ','); //extract his name using getline
l >> oneEmp.id; //extract his id
//extract other fields from the stringstream as neccessary
employees.push_back(oneEmp);
}
As for adding information: when the user enters the data, just store it in your employees vector; and when you should need to update the file, you may simply overwrite the original data file with a new one by opening it for writing & dumping the data there (this is obviously a rather wasteful strategy, but it's fine for a school assignment (I suppose it's school assignment)).
Start by splitting the CSV line into separate fields and then populate a struct with this data
eg:
struct Employee
{
std::string name;
std::string id_number;
std::string mobilenumber;
std::string location;
};
std::vector<Employee> employees; // Note you dont need a pointer
Look at string methods find_first_of, substr and friends.