Parsing a huge complicated CSV file using C++

Parsing a huge complicated CSV file using C++ - c++

I have a large CSV file which looks like this:
23456, The End is Near, A silly description that makes no sense, http://www.example.com, 45332, 5th July 1998 Sunday, 45.332
That's just one line of the CSV file. There are around 500k of these.
I want to parse this file using C++. The code I started out with is:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
// open the input csv file containing training data
ifstream inputFile("my.csv");
string line;
while (getline(inputFile, line, ','))
{
istringstream ss(line);
// declaring appropriate variables present in csv file
long unsigned id;
string url, title, description, datetaken;
float val1, val2;
ss >> id >> url >> title >> datetaken >> description >> val1 >> val2;
cout << url << endl;
}
inputFile.close();
}
The problem is that it's not printing out the correct values.
I suspect that it's not able to handle white spaces within a field. So what do you suggest I should do?
Thanks

In this example we have to parse the string using two getline. The first gets a line of cvs text getline(cin, line) useing default newline delimiter. The second getline(ss, line, ',') delimits using commas to separates the strings.
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
float get_float(const std::string& s) {
std::stringstream ss(s);
float ret;
ss >> ret;
return ret;
}
int get_int(const std::string& s) {
std::stringstream ss(s);
int ret;
ss >> ret;
return ret;
}
int main() {
std::string line;
while (getline(cin, line)) {
std::stringstream ss(line);
std::vector<std::string> v;
std::string field;
while(getline(ss, field, ',')) {
std::cout << " " << field;
v.push_back(field);
}
int id = get_int(v[0]);
float f = get_float(v[6]);
std::cout << v[3] << std::endl;
}
}

Using std::istream to read std::strings using the overloaded insertion operator is not going to work well. The entire line is a string, so it won't pick up that there is a change in fields by default. A quick fix would be to split the line on commas and assign the values to the appropriate fields (instead of using std::istringstream).
NOTE: That is in addition to jrok's point about std::getline

Within the stated constraints, I think I'd do something like this:
#include <locale>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
// A ctype that classifies only comma and new-line as "white space":
struct field_reader : std::ctype<char> {
field_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(table_size, std::ctype_base::mask());
rc[','] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
// A struct to hold one record from the file:
struct record {
std::string key, name, desc, url, zip, date, number;
friend std::istream &operator>>(std::istream &is, record &r) {
return is >> r.key >> r.name >> r.desc >> r.url >> r.zip >> r.date >> r.number;
}
friend std::ostream &operator<<(std::ostream &os, record const &r) {
return os << "key: " << r.key
<< "\nname: " << r.name
<< "\ndesc: " << r.desc
<< "\nurl: " << r.url
<< "\nzip: " << r.zip
<< "\ndate: " << r.date
<< "\nnumber: " << r.number;
}
};
int main() {
std::stringstream input("23456, The End is Near, A silly description that makes no sense, http://www.example.com, 45332, 5th July 1998 Sunday, 45.332");
// use our ctype facet with the stream:
input.imbue(std::locale(std::locale(), new field_reader()));
// read in all our records:
std::istream_iterator<record> in(input), end;
std::vector<record> records{ in, end };
// show what we read:
std::copy(records.begin(), records.end(),
std::ostream_iterator<record>(std::cout, "\n"));
}
This is, beyond a doubt, longer than most of the others -- but it's all broken into small, mostly-reusable pieces. Once you have the other pieces in place, the code to read the data is trivial:
std::vector<record> records{ in, end };
One other point I find compelling: the first time the code compiled, it also ran correctly (and I find that quite routine for this style of programming).

I have just worked out this problem for myself and am willing to share! It may be a little overkill but it shows a working example of how Boost Tokenizer & vectors handle a big problem.
/*
* ALfred Haines Copyleft 2013
* convert csv to sql file
* csv2sql requires that each line is a unique record
*
* This example of file read and the Boost tokenizer
*
* In the spirit of COBOL I do not output until the end
* when all the print lines are ouput at once
* Special thanks to SBHacker for the code to handle linefeeds
*/
#include <sstream>
#include <boost/tokenizer.hpp>
#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/algorithm/string/replace.hpp>
#include <vector>
namespace io = boost::iostreams;
using boost::tokenizer;
using boost::escaped_list_separator;
typedef tokenizer<escaped_list_separator<char> > so_tokenizer;
using namespace std;
using namespace boost;
vector<string> parser( string );
int main()
{
vector<string> stuff ; // this is the data in a vector
string filename; // this is the input file
string c = ""; // this holds the print line
string sr ;
cout << "Enter filename: " ;
cin >> filename;
//filename = "drwho.csv";
int lastindex = filename.find_last_of("."); // find where the extension begins
string rawname = filename.substr(0, lastindex); // extract the raw name
stuff = parser( filename ); // this gets the data from the file
/** I ask if the user wants a new_index to be created */
cout << "\n\nMySql requires a unique ID field as a Primary Key \n" ;
cout << "If the first field is not unique (no dupicate entries) \nthan you should create a " ;
cout << "New index field for this data.\n" ;
cout << "Not Sure! try no first to maintain data integrity.\n" ;
string ni ;bool invalid_data = true;bool new_index = false ;
do {
cout<<"Should I create a New Index now? (y/n)"<<endl;
cin>>ni;
if ( ni == "y" || ni == "n" ) { invalid_data =false ; }
} while (invalid_data);
cout << "\n" ;
if (ni == "y" )
{
new_index = true ;
sr = rawname.c_str() ; sr.append("_id" ); // new_index field
}
// now make the sql code from the vector stuff
// Create table section
c.append("DROP TABLE IF EXISTS `");
c.append(rawname.c_str() );
c.append("`;");
c.append("\nCREATE TABLE IF NOT EXISTS `");
c.append(rawname.c_str() );
c.append( "` (");
c.append("\n");
if (new_index)
{
c.append( "`");
c.append(sr );
c.append( "` int(10) unsigned NOT NULL,");
c.append("\n");
}
string s = stuff[0];// it is assumed that line zero has fieldnames
int x =0 ; // used to determine if new index is printed
// boost tokenizer code from the Boost website -- tok holds the token
so_tokenizer tok(s, escaped_list_separator<char>('\\', ',', '\"'));
for(so_tokenizer::iterator beg=tok.begin(); beg!=tok.end(); ++beg)
{
x++; // keeps number of fields for later use to eliminate the comma on the last entry
if (x == 1 && new_index == false ) sr = static_cast<string> (*beg) ;
c.append( "`" );
c.append(*beg);
if (x == 1 && new_index == false )
{
c.append( "` int(10) unsigned NOT NULL,");
}
else
{
c.append("` text ,");
}
c.append("\n");
}
c.append("PRIMARY KEY (`");
c.append(sr );
c.append("`)" );
c.append("\n");
c.append( ") ENGINE=InnoDB DEFAULT CHARSET=latin1;");
c.append("\n");
c.append("\n");
// The Create table section is done
// Now make the Insert lines one per line is safer in case you need to split the sql file
for (int w=1; w < stuff.size(); ++w)
{
c.append("INSERT INTO `");
c.append(rawname.c_str() );
c.append("` VALUES ( ");
if (new_index)
{
string String = static_cast<ostringstream*>( &(ostringstream() << w) )->str();
c.append(String);
c.append(" , ");
}
int p = 1 ; // used to eliminate the comma on the last entry
// tokenizer code needs unique name -- stok holds this token
so_tokenizer stok(stuff[w], escaped_list_separator<char>('\\', ',', '\"'));
for(so_tokenizer::iterator beg=stok.begin(); beg!=stok.end(); ++beg)
{
c.append(" '");
string str = static_cast<string> (*beg) ;
boost::replace_all(str, "'", "\\'");
// boost::replace_all(str, "\n", " -- ");
c.append( str);
c.append("' ");
if ( p < x ) c.append(",") ;// we dont want a comma on the last entry
p++ ;
}
c.append( ");\n");
}
// now print the whole thing to an output file
string out_file = rawname.c_str() ;
out_file.append(".sql");
io::stream_buffer<io::file_sink> buf(out_file);
std::ostream out(&buf);
out << c ;
// let the user know that they are done
cout<< "Well if you got here then the data should be in the file " << out_file << "\n" ;
return 0;}
vector<string> parser( string filename )
{
typedef tokenizer< escaped_list_separator<char> > Tokenizer;
escaped_list_separator<char> sep('\\', ',', '\"');
vector<string> stuff ;
string data(filename);
ifstream in(filename.c_str());
string li;
string buffer;
bool inside_quotes(false);
size_t last_quote(0);
while (getline(in,buffer))
{
// --- deal with line breaks in quoted strings
last_quote = buffer.find_first_of('"');
while (last_quote != string::npos)
{
inside_quotes = !inside_quotes;
last_quote = buffer.find_first_of('"',last_quote+1);
}
li.append(buffer);
if (inside_quotes)
{
li.append("\n");
continue;
}
// ---
stuff.push_back(li);
li.clear(); // clear here, next check could fail
}
in.close();
//cout << stuff.size() << endl ;
return stuff ;
}

You are right to suspect that your code is not behaving as desired because the whitespace within the field values.
If you indeed have "simple" CSV where no field may contain a comma within the field value, then I would step away from the stream operators and perhaps C++ all together. The example program in the question merely re-orders fields. There is no need to actually interpret or convert the values into their appropriate types (unless validation was also a goal). Reordering alone is super easy to accomplish with awk. For example, the following command would reverse 3 fields found in a simple CSV file.
cat infile | awk -F, '{ print $3","$2","$1 }' > outfile
If the goal is really to use this code snippet as a launchpad for bigger and better ideas ... then I would tokenize the line by searching for commas. The std::string class has a built-in method to find the offsets specific characters. You can make this approach as elegant or inelegant as you want. The most elegant approaches end up looking something like the boost tokenization code.
The quick-and-dirty approach is to just to know your program has N fields and look for the positions of the corresponding N-1 commas. Once you have those positions, it is pretty straightforward to invoke std::string::substr to extract the fields of interest.

Related

c++ split string by delimiter into char array

I have a file with lines in the format:
firstword;secondword;4.0
I need to split the lines by ;, store the first two words in char arrays, and store the number as a double.
In Python, I would just use split(";"), then split("") on the first two indexes of the resulting list then float() on the last index. But I don't know the syntax for doing this in C++.
So far, I'm able to read from the file and store the lines as strings in the studentList array. But I don't know where to begin with extracting the words and numbers from the items in the array. I know I would need to declare new variables to store them in, but I'm not there yet.
I don't want to use vectors for this.
#include <iomanip>
#include <fstream>
#include <string>
#include <stdlib.h>
#include <iostream>
using namespace std;
int main() {
string studentList[4];
ifstream file;
file.open("input.txt");
if(file.is_open()) {
for (int i = 0; i < 4; i++) {
file >> studentList[i];
}
file.close();
}
for(int i = 0; i < 4; i++) {
cout << studentList[i];
}
return 0;
}

you can use std::getline which support delimiter
#include <string>
#include <sstream>
#include <iostream>
int main() {
std::istringstream file("a;b;1.0\nc;d;2.0");
for (int i = 0; i < 2; i++){
std::string x,y,v;
std::getline(file,x,';');
std::getline(file,y,';');
std::getline(file,v); // default delim is new line
std::cout << x << ' ' << y << ' ' << v << '\n';
}
}

C++ uses the stream class as its string-handling workhorse. Every kind of transformation is typically designed to work through them. For splitting strings, std::getline() is absolutely the right tool. (And possibly a std::istringstream to help out.)
A few other pointers as well.
Use struct for related information
Here we have a “student” with three related pieces of information:
struct Student {
std::string last_name;
std::string first_name;
double gpa;
};
Notice how one of those items is not a string.
Keep track of the number of items used in an array
Your arrays should have a maximum (allocated) size, plus a separate count of the items used.
constexpr int MAX_STUDENTS = 100;
Student studentList[MAX_STUDENTS];
int num_students = 0;
When adding an item (to the end), remember that in C++ arrays always start with index 0:
if (num_students < MAX_STUDENTS) {
studentList[num_students].first_name = "James";
studentList[num_students].last_name = "Bond";
studentList[num_students].gpa = 4.0;
num_students += 1;
}
You can avoid some of that bookkeeping by using a std::vector:
std::vector <Student> studentList;
studentList.emplace_back( "James", "Bond", 4.0 );
But as you requested we avoid them, we’ll stick with arrays.
Use a stream extractor function overload to read a struct from stream
The input stream is expected to have student data formatted as a semicolon-delimited record — that is: last name, semicolon, first name, semicolon, gpa, newline.
std::istream & operator >> ( std::istream & ins, Student & student ) {
ins >> std::ws; // skip any leading whitespace
getline( ins, student.last_name, ';' ); // read last_name & eat delimiter
getline( ins, student.first_name, ';' ); // read first_name & eat delimiter
ins >> student.gpa; // read gpa. Does not eat delimiters
ins >> std::ws; // skip all trailing whitespace (including newline)
return ins;
}
Notice how std::getline() was put to use here to read strings terminating with a semicolon. Everything else must be either:
read as a string then converted to the desired type, or
read using the >> operator and have the delimiter specifically read.
For example, if the GPA were not last in our list, we would have to read and discard (“eat”) a semicolon:
char c;
ins >> student.gpa >> c;
if (c != ';') ins.setstate( std::ios::failbit );
Yes, that is kind of long and obnoxious. But it is how C++ streams work.
Fortunately with our current Student structure, we can eat that trailing newline along with all other whitespace.
Now we can easily read a list of students until the stream indicates EOF (or any error):
while (f >> studentList[num_students]) {
num_students += 1;
if (num_students == MAX_STUDENTS) break; // don’t forget to watch your bounds!
}
Use a stream insertion function overload to write
’Nuff said.
std::ostream & operator << ( std::ostream & outs, const Student & student ) {
return outs
<< student.last_name << ";"
<< student.first_name << ";"
<< std::fixed << std::setprecision(1) << student.gpa << "\n";
}
I am personally disinclined to modify stream characteristics on argument streams, and would instead use an intermediary std::ostreamstream:
std::ostringstream oss;
oss << std::fixed << std::setprecision(1) << student.gpa;
outs << oss.str() << "\n";
But that is beyond the usual examples, and is often unnecessary. Know your data.
Either way you can now write the list of students with a simple << in a loop:
for (int n = 0; n < num_students; n++)
f << studentList[n];
Use streams with C++ idioms
You are typing too much. Use C++’s object storage model to your advantage. Curly braces (for compound statements) help tremendously.
While you are at it, name your input files as descriptively as you are allowed.
{
std::ifstream f( "students.txt" );
while (f >> studentList[num_students])
if (++num_students == MAX_STUDENTS)
break;
}
No students will be read if f does not open. Reading will stop once you run out of students (or some error occurs) or you run out of space in the array, whichever comes first. And the file is automatically closed and the f object is destroyed when we hit that final closing brace, which terminates the lexical context containing it.
Include only required headers
Finally, try to include only those headers you actually use. This is something of an acquired skill, alas. It helps when you are beginning to list those things you are including them for right alongside the directive.
Putting it all together into a working example
#include <algorithm> // std::sort
#include <fstream> // std::ifstream
#include <iomanip> // std::setprecision
#include <iostream> // std::cin, std::cout, etc
#include <string> // std::string
struct Student {
std::string last_name;
std::string first_name;
double gpa;
};
std::istream & operator >> ( std::istream & ins, Student & student ) {
ins >> std::ws; // skip any leading whitespace
getline( ins, student.last_name, ';' ); // read last_name & eat delimiter
getline( ins, student.first_name, ';' ); // read first_name & eat delimiter
ins >> student.gpa; // read gpa. Does not eat delimiters
ins >> std::ws; // skip all trailing whitespace (including newline)
return ins;
}
std::ostream & operator << ( std::ostream & outs, const Student & student ) {
return outs
<< student.last_name << ";"
<< student.first_name << ";"
<< std::fixed << std::setprecision(1) << student.gpa << "\n";
}
int main() {
constexpr int MAX_STUDENTS = 100;
Student studentList[MAX_STUDENTS];
int num_students = 0;
// Read students from file
std::ifstream f( "students.txt" );
while (f >> studentList[num_students])
if (++num_students == MAX_STUDENTS)
break;
// Sort students by GPA from lowest to highest
std::sort( studentList, studentList+num_students,
[]( auto a, auto b ) { return a.gpa < b.gpa; } );
// Print students
for(int i = 0; i < num_students; i++) {
std::cout << studentList[i];
}
}
The “students.txt” file contains:
Blackfoot;Lawrence;3.7
Chén;Junfeng;3.8
Gupta;Chaya;4.0
Martin;Anita;3.6
Running the program produces the output:
Martin;Anita;3.6
Blackfoot;Lawrence;3.7
Chén;Junfeng;3.8
Gupta;Chaya;4.0
You can, of course, print the students any way you wish. This example just prints them with the same semicolon-delimited-format as they were input. Here we print them with GPA and surname only:
for (int n = 0; n < num_students; n++)
std::cout << studentList[n].gpa << ": " << studentList[n].last_name << "\n";
Every language has its own idiomatic usage which you should learn to take advantage of.

Parsing text file lines in C++

I have a txt file with data such as following:
regNumber FName Score1 Score2 Score3
385234 John Snow 90.0 56.0 60.8
38345234 Michael Bolton 30.0 26.5
38500234 Tim Cook 40.0 56.5 20.2
1547234 Admin__One 10.0
...
The data is separated only by whitespace.
Now, my issue is that as some of the data is missing, I cannot simply do as following:
ifstream file;
file.open("file.txt")
file >> regNo >> fName >> lName >> score1 >> score2 >> score3
(I'm not sure if code above is right, but trying to explain the idea)
What I want to do is roughly this:
cout << "Reg Number: ";
cin >> regNo;
cout << "Name: ";
cin >> name;
if(regNo == regNumber && name == fname) {
cout << "Access granted" << endl;
}
This is what I've tried/where I'm at:
ifstream file;
file.open("users.txt");
string line;
while(getline(file, line)) {
stringstream ss(line);
string word;
while(ss >> word) {
cout << word << "\t";
}
cout << " " << endl;
}
I can output the file entirely, my issue is when it comes to picking the parts, e.g. only getting the regNumber or the name.

I would read the whole line in at once and then just substring it (since you suggest that these are fixed width fields)

Handling the spaces between the words of the names are tricky, but its apparent from your file that each column starts at a fixed offset. You can use this to extract the information you want. For example, in order to read the names, you can read the line starting at the offset that FName starts, and ending at the offset that Score1 starts. Then you can remove trailing white spaces from the string like this:
string A = "Tim Cook ";
auto index = A.find_last_not_of(' ');
A.erase(index + 1);

Alright, I can’t sleep and so decided to go bonkers and demonstrate just how tricky input is, especially when you have freeform data. The following code contains plenty of commentary on reading freeform data that may be missing.
#include <ciso646>
#include <deque>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <optional>
#include <sstream>
#include <string>
#include <type_traits>
#include <vector>
// Useful Stuff
template <typename T> T& lvalue( T&& arg ) { return arg; }
using strings = std::deque <std::string> ;
auto split( const std::string& s )
{
return strings
(
std::istream_iterator <std::string> ( lvalue( std::istringstream{ s } ) ),
std::istream_iterator <std::string> ()
);
}
template <typename T>
auto string_to( const std::string & s )
{
T value;
std::istringstream ss( s );
return ((ss >> value) and (ss >> std::ws).eof())
? value
: std::optional<T> { };
}
std::string trim( const std::string& s )
{
auto R = s.find_last_not_of ( " \f\n\r\t\v" ) + 1;
auto L = s.find_first_not_of( " \f\n\r\t\v" );
return s.substr( L, R-L );
}
// Each record is stored as a “User”.
// “Users” is a complete dataset of records.
struct User
{
int regNumber;
std::vector <std::string> names;
std::vector <double> scores;
};
using Users = std::vector <User> ;
// This is stuff you would put in the .cpp file, not an .hpp file.
// But since this is a single-file example, it goes here.
namespace detail::Users
{
static const char * FILE_HEADER = "regNumber FName Score1 Score2 Score3\n";
static const int REGNUMBER_WIDTH = 11;
static const int NAMES_TOTAL_WIDTH = 18;
static const int SCORE_WIDTH = 9;
static const int SCORE_PRECISION = 1;
}
// Input is always the hardest part, and provides a WHOLE lot of caveats to deal with.
// Let us take them one at a time.
//
// Each user is a record composed of ONE OR MORE elements on a line of text.
// The elements are:
// (regNumber)? (name)* (score)*
//
// The way we handle this is:
// (1) Read the entire line
// (2) Split the line into substrings
// (3) If the first element is a regNumber, grab it
// (4) Grab any trailing floating point values as scores
// (5) Anything remaining must be names
//
// There are comments in the code below which enable you to produce a hard failure
// if any record is incorrect, however you define that. A “hard fail” sets the fail
// state on the input stream, which will stop all further input on the stream until
// the caller uses the .clear() method on the stream.
//
// The default action is to stop reading records if a failure occurs. This way the
// CALLER can decide whether to clear the error and try to read more records.
//
// Finally, we use decltype liberally to make it easier to modify the User struct
// without having to watch out for type problems with the stream extraction operator.
// Input a single record
std::istream& operator >> ( std::istream& ins, User& user )
{
// // Hard fail helper (named lambda)
// auto failure = [&ins]() -> std::istream&
// {
// ins.setstate( std::ios::failbit );
// return ins;
// };
// You should generally clear your target object when writing stream extraction operators
user = User{};
// Get a single record (line) from file
std::string s;
if (!getline( ins, s )) return ins;
// Split the record into fields
auto fields = split( s );
// Skip (blank lines) and (file headers)
static const strings header = split( detail::Users::FILE_HEADER );
if (fields.empty() or fields == header) return operator >> ( ins, user );
// The optional regNumber must appear first
auto reg_number = string_to <decltype(user.regNumber)> ( fields.front() );
if (reg_number)
{
user.regNumber = *reg_number;
fields.pop_front();
}
// Optional scores must appear last
while (!fields.empty())
{
auto score = string_to <std::remove_reference <decltype(user.scores.front())> ::type> ( fields.back() );
if (!score) break;
user.scores.insert( user.scores.begin(), *score );
fields.pop_back();
}
// if (user.scores.size() > 3) return failure(); // is there a maximum number of scores?
// Any remaining fields are names.
// if (fields.empty()) return failure(); // at least one name required?
// if (fields.size() > 2) return failure(); // maximum of two names?
for (const auto& name : fields)
{
// (You could also check that each name matches a valid regex pattern, etc)
user.names.push_back( name );
}
// If we got this far, all is good. Return the input stream.
return ins;
}
// Input a complete User dataset
std::istream& operator >> ( std::istream& ins, Users& users )
{
// This time, do NOT clear the target object! This permits the caller to read
// multiple files and combine them! The caller is also now responsible to
// provide a new/empty/clear target Users object to avoid combining datasets.
// Read all records
User user;
while (ins >> user) users.push_back( user );
// Return the input stream
return ins;
}
// Output, by comparison, is fabulously easy.
//
// I won’t bother to explain any of this, except to recall that
// the User is stored as a line-object record -- that is, it must
// be terminated by a newline. Hence we output the newline in the
// single User stream insertion operator (output operator) instead
// of the Users output operator.
// Output a single User record
std::ostream& operator << ( std::ostream& outs, const User& user )
{
std::ostringstream userstring;
userstring << std::setw( detail::Users::REGNUMBER_WIDTH ) << std::left << user.regNumber;
std::ostringstream names;
for (const auto& name : user.names) names << name << " ";
userstring << std::setw( detail::Users::NAMES_TOTAL_WIDTH ) << std::left << names.str();
for (auto score : user.scores)
userstring
<< std::left << std::setw( detail::Users::SCORE_WIDTH )
<< std::fixed << std::setprecision( detail::Users::SCORE_PRECISION )
<< score;
return outs << trim( userstring.str() ) << "\n"; // <-- output of newline
}
// Output a complete User dataset
std::ostream& operator << ( std::ostream& outs, const Users& users )
{
outs << detail::Users::FILE_HEADER;
for (const auto& user : users) outs << user;
return outs;
}
int main()
{
// Example input. Notice that any field may be absent.
std::istringstream input(
"regNumber FName Score1 Score2 Score3 \n"
"385234 John Snow 90.0 56.0 60.8 \n"
"38345234 Michael Bolton 30.0 26.5 \n"
"38500234 Tim Cook 40.0 56.5 20.2 \n"
"1547234 Admin__One 10.0 \n"
" \n" // blank line --> skipped
" Jon Bon Jovi \n"
"11111 22.2 \n"
" 33.3 \n"
"4444 \n"
"55 Justin Johnson \n"
);
Users users;
input >> users;
std::cout << users;
}
To compile with MSVC:
cl /EHsc /W4 /Ox /std:c++17 a.cpp
To compile with Clang:
clang++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 a.cpp
To compile with MinGW/GCC/etc use the same as Clang, substituting g++ for clang++, naturally.
As a final note, if you can make your data file much more strict life will be significantly easier. For example, if you can say that you are always going to used fixed-width fields you can use Shahriar’s answer, for example, or pm100’s answer, which I upvoted.

I would define a Person class.
This knows how to read and write a Person on one line.
class Person
{
int regNumber;
std::string FName;
std::array<float,3> scope;
friend std::ostream& operator<<(std::ostream& s, Person const& p)
{
return p << regNumber << " " << FName << " " << scope[0] << " " << scope[1] << " " << scope[2] << "\n";
}
friend std::istream& operator>>(std::istream& s, Person& p)
{
std::string line;
std::getline(s, line);
bool valid = true;
Person tmp; // Holds value while we check
// Handle Line.
// Handle missing data.
// And update tmp to the correct state.
if (valid) {
// The conversion worked.
// So update the object we are reading into.
swap(p, tmp);
}
else {
// The conversion failed.
// Set the stream to bad so we stop reading.
s.setstate(std::ios::bad);
}
return s;
}
void swap(Person& other) noexcept
{
using std::swap;
swap(regNumber, other.regNumber);
swap(FName, other.FName);
swap(scope, other.scope);
}
};
Then your main becomes much simpler.
int main()
{
std::ifstream file("Data");
Person person;
while (file >> person)
{
std::cout << person;
}
}
It also becomes easier to handle your second part.
You load each person then ask the Person object to validate that credentials.
class Person
{
// STUFF From before:
public:
bool validateUser(int id, std::string const& name) const
{
return id == regNumber && name == FName;
}
};
int main()
{
int reg = getUserReg();
std::string name = getUserName();
std::ifstream file("Data");
Person person;
while (file >> person)
{
if (person.validateUser(reg, name))
{
std::cout << "Access Granted\n";
}
}
}

how to display text file in c++?

I want to display the text file in my c++ program but nothing appears and the program just ended. I am using struct here. I previously used this kind of method, but now I am not sure why it isn't working. I hope someone could help me. Thanks a lot.
struct Records{
int ID;
string desc;
string supplier;
double price;
int quantity;
int rop;
string category;
string uom;
}record[50];
void inventory() {
int ID, quantity, rop;
string desc, supplier, category, uom;
double price;
ifstream file("sample inventory.txt");
if (file.fail()) {
cout << "Error opening records file." <<endl;
exit(1);
}
int i = 0;
while(! file.eof()){
file >> ID >> desc >> supplier >> price >> quantity >> rop >> category >> uom;
record[i].ID = ID;
record[i].desc = desc;
record[i].supplier = supplier;
record[i].price = price;
record[i].quantity = quantity;
record[i].rop = rop;
record[i].category = category;
record[i].uom = uom;
i++;
}
for (int a = 0; a < 15; a++) {
cout << "\n\t";
cout.width(10); cout << left << record[a].ID;
cout.width(10); cout << left << record[a].desc;
cout.width(10); cout << left << record[a].supplier;
cout.width(10); cout << left << record[a].price;
cout.width(10); cout << left << record[a].quantity;
cout.width(10); cout << left << record[a].rop;
cout.width(10); cout << left << record[a].category;
cout.width(10); cout << left << record[a].uom << endl;
}
file.close();
}
Here is the txt file:

Here are a couple of things you should consider.
Declare the variables as you need them. Don’t declare them at the top of your function. It makes the code more readable.
Use the file’s full path to avoid confusions. For instance "c:/temp/sample inventory.txt".
if ( ! file ) is shorter.
To read data in a loop, use the actual read as a condition while( file >> ID >>... ). This would have revealed the cause of your problem.
Read about the setw manipulator.
file's destructor will close the stream - you don't need to call close()
Your file format consists of a header and data. You do not read the header. You are trying to directly read the data. You try to match the header against various data types: strings, integers, floats; but the header is entirely made of words. Your attempt will invalidate the stream and all subsequent reading attempts will fail. So, first discard the header – you may use getline.
Some columns contain data consisting of more than one word. file >> supplier reads one word, not two or more. So you will get "Mongol", not "Mongol Inc." Your data format needs a separator between columns. Otherwise you won’t be able to tell where the column ends. If you add a separator, again, you may use getline to read fields.
The CATEGORY column is empty. Trying to read it will result in reading from a different column. Adding a separator will also solve the empty category column problem.
This is how your first rows will look like if you use comma as separator:
ID,PROD DESC,SUPPLIER,PRICE,QTY,ROP,CATEGORY,UOM
001,Pencil,Mongol Inc.,8,200,5,,pcs
A different format solution would be to define a string as a zero or more characters enclosed in quotes:
001 "Pencil" "Mongol Inc." 8 200 5 "" "pcs"
and take advantage of the quoted manipulator (note the empty category string):
const int max_records_count = 50;
Record records[max_records_count];
istream& read_record(istream& is, Record& r) // returns the read record in r
{
return is >> r.ID >> quoted(r.desc) >> quoted(r.supplier) >> r.price >> r.quantity >> r.rop >> quoted(r.category) >> quoted(r.uom);
}
istream& read_inventory(istream& is, int& i) // returns the number of read records in i
{
//...
for (i = 0; i < max_records_count && read_record(is, records[i]); ++i)
; // no operation
return is;
}

Unfortunately you text file is not a typical CSV file, delimited by some character like a comma or such. The entries in the lines seem to be separated by tabs. But this is a guess by me. Anyway. The structure of the source file makes it harder to read.
Additionally, the file has an header and while reading the first line andtry to read the word "ID" into an int variable, this conversion will fail. The failbit of the stream is set, and from then on all further access to any iostream function for this stream will do nothing any longer. It will ignore all your further requests to read something.
Additional difficulty is that you have spaces in data fields. But the extractor operator for formatted input >> will stop, if it sees a white space. So, maybe only read half of the field in a record.
Solution: You must first read the header file, then the data rows.
Next, you must know if the file is really tab separated. Sometimes tabs are converted to spaces. In that case, we would need to recreate the start position of a field in the a record.
In any case, you need to read a complete line, and after that split it in parts.
For the first solution approach, I assume tab separated fields.
One of many possible examples:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
#include <iomanip>
const std::string fileName{"r:\\sample inventory.txt"};
struct Record {
int ID;
std::string desc;
std::string supplier;
double price;
int quantity;
int rop;
std::string category;
std::string uom;
};
using Database = std::vector<Record>;
int main() {
// Open the source text file with inventory data and check, if it could be opened
if (std::ifstream ifs{ fileName }; ifs) {
// Here we will store all data
Database database{};
// Read the first header line and throw it away
std::string line{};
std::string header{};
if (std::getline(ifs, header)) {
// Now read all lines containing record data
while (std::getline(ifs, line)) {
// Now, we read a line and can split it into substrings. Assuming the tab as delimiter
// To be able to extract data from the textfile, we will put the line into a std::istrringstream
std::istringstream iss{ line };
// One Record
Record record{};
std::string field{};
// Read fields and put in record
if (std::getline(iss, field, '\t')) record.ID = std::stoi(field);
if (std::getline(iss, field, '\t')) record.desc = field;
if (std::getline(iss, field, '\t')) record.supplier = field;
if (std::getline(iss, field, '\t')) record.price = std::stod(field);
if (std::getline(iss, field, '\t')) record.quantity = std::stoi(field);
if (std::getline(iss, field, '\t')) record.rop = std::stoi(field);
if (std::getline(iss, field, '\t')) record.category = field;
if (std::getline(iss, field)) record.uom = field;
database.push_back(record);
}
// Now we read the complete database
// Show some debug output.
std::cout << "\n\nDatabase:\n\n\n";
// Show all records
for (const Record& r : database)
std::cout << std::left << std::setw(7) << r.ID << std::setw(20) << r.desc
<< std::setw(20) << r.supplier << std::setw(8) << r.price << std::setw(7)
<< r.quantity << std::setw(8) << r.rop << std::setw(20) << r.category << std::setw(8) << r.uom << '\n';
}
}
else std::cerr << "\nError: COuld not open source file '" << fileName << "'\n\n";
}
But to be honest, there are many assumptions. And tab handling is notoriously error prone.
So, let us make the next approach and extract the data according to their position in the header string. So, we will check, where each header string starts and use this information to later split a complete line into substrings.
We will use a list of Field Descriptors and search for their start position and width in the header line.
Example:
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
#include <iomanip>
#include <array>
const std::string fileName{"r:\\sample inventory.txt"};
struct Record {
int ID;
std::string desc;
std::string supplier;
double price;
int quantity;
int rop;
std::string category;
std::string uom;
};
constexpr size_t NumberOfFieldsInRecord = 8u;
using Database = std::vector<Record>;
int main() {
// Open the source text file with inventory data and check, if it could be opened
if (std::ifstream ifs{ fileName }; ifs) {
// Here we will store all data
Database database{};
// Read the first header line and throw it away
std::string line{};
std::string header{};
if (std::getline(ifs, header)) {
// Analyse the header
// We have 8 elements in one record. We will store the positions of header items
std::array<size_t, NumberOfFieldsInRecord> startPosition{};
std::array<size_t, NumberOfFieldsInRecord> fieldWidth{};
const std::array<std::string, NumberOfFieldsInRecord> expectedHeaderNames{ "ID","PROD DESC","SUPPLIER","PRICE","QTY","ROP","CATEGORY","UOM"};
for (size_t k{}; k < NumberOfFieldsInRecord; ++k)
startPosition[k] = header.find(expectedHeaderNames[k]);
for (size_t k{ 1 }; k < NumberOfFieldsInRecord; ++k)
fieldWidth[k - 1] = startPosition[k] - startPosition[k - 1];
fieldWidth[NumberOfFieldsInRecord - 1] = header.length() - startPosition[NumberOfFieldsInRecord - 1];
// Now read all lines containing record data
while (std::getline(ifs, line)) {
// Now, we read a line and can split it into substrings. Based on poisition and field width
// To be able to extract data from the textfile, we will put the line into a std::istrringstream
std::istringstream iss{ line };
// One Record
Record record{};
std::string field{};
// Read fields and put in record
field = line.substr(startPosition[0], fieldWidth[0]); record.ID = std::stoi(field);
field = line.substr(startPosition[1], fieldWidth[1]); record.desc = field;
field = line.substr(startPosition[2], fieldWidth[2]); record.supplier = field;
field = line.substr(startPosition[3], fieldWidth[3]); record.price = std::stod(field);
field = line.substr(startPosition[4], fieldWidth[4]); record.quantity = std::stoi(field);
field = line.substr(startPosition[5], fieldWidth[5]); record.rop = std::stoi(field);
field = line.substr(startPosition[6], fieldWidth[6]); record.category = field;
field = line.substr(startPosition[7], fieldWidth[7]); record.uom = field;
database.push_back(record);
}
// Now we read the complete database
// Show some debug output.
std::cout << "\n\nDatabase:\n\n\n";
// Header
for (size_t k{}; k < NumberOfFieldsInRecord; ++k)
std::cout << std::left << std::setw(fieldWidth[k]) << expectedHeaderNames[k];
std::cout << '\n';
// Show all records
for (const Record& r : database)
std::cout << std::left << std::setw(fieldWidth[0]) << r.ID << std::setw(fieldWidth[1]) << r.desc
<< std::setw(fieldWidth[2]) << r.supplier << std::setw(fieldWidth[3]) << r.price << std::setw(fieldWidth[4])
<< r.quantity << std::setw(fieldWidth[5]) << r.rop << std::setw(fieldWidth[6]) << r.category << std::setw(fieldWidth[7]) << r.uom << '\n';
}
}
else std::cerr << "\nError: COuld not open source file '" << fileName << "'\n\n";
}
But this is still not all.
We should wrap all functions belonging to a record into the struct Record. And the same for the data base. And espcially we want to overwrite the extractor and the inserter operator. Then input and output will later be very simple.
We will save this for later . . .
If you can give more and better information regarding the structure of the source file, then I will update my answer.

Split a sentence (string) at the spaces [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 5 years ago.
I am trying to split a single string, with spaces, into three separate strings. For example, I have one string (str1). The user inputs any 3 words such as
"Hey it's me" or "It's hot out".
From there, I need to write a function that will take this string (str1) and divide it up into three different strings. So that (taking the first example) it will then say:
Hey (is the first part of the string)
it's (is the second part of the string)
me (is the third part of the string)
I'm having difficulty which manipulation I should be using to split the string at the spaces.
This is the code I have so far, which is just how the user will enter input.I am looking for the most basic way to accomplish this WITHOUT using istringstream! Using only basic string manipulation such as find(), substr().
** I am looking to create a separate function to perform the breaking up of string ** I figured out how to get the first section of input with this code:
cout << "Enter a string" << endl;
getline(cin, one);
position = str1.find(' ', position);
first_section = str1.substr(0, position);
But now I have no idea how to get the second section or the third section of the string to be divided up into their own string. I was thinking a for loop maybe?? Not sure.
#include <iostream>
#include <string>
using namespace std;
int main() {
string str1;
cout << "Enter three words: ";
getline(cin, str1);
while(cin) {
cout << "Original string: " << str1 << endl;
cin >> str1;
}
return;
}

I'm having difficulty which manipulation I should be using to split the string at the spaces.
Use a std::istringstream from str1.
Read each of the tokens from the std::istringstream.
// No need to use a while loop unless you wish to do the same
// thing for multiple lines.
// while(cin) {
cout << "Original string: " << str1 << endl;
std::istringstream stream(str1);
std::string token1;
std::string token2;
std::string token3;
stream >> token1 >> token2 >> token3;
// Use the tokens anyway you wish
// }
If you wish to do the same thing for multiple lines of input, use:
int main() {
string str1;
cout << "Enter three words: ";
while(getline(cin, str1))
{
cout << "Original string: " << str1 << endl;
std::istringstream stream(str1);
std::string token1;
std::string token2;
std::string token3;
stream >> token1 >> token2 >> token3;
// Use the tokens anyway you wish
// Prompt the user for another line
cout << "Enter three words: ";
}
}

Perhaps the most basic solution is to use that which resides inside of your loop to read a single word. For example:
cin >> word1; // extracts the first word
cin >> word2; // extracts the second word
getline(cin, line); // extracts the rest of the line
You can use the result or return value of these expressions to check success:
#include <string>
#include <iostream>
int main(void) {
std::string word1, word2, line;
int success = std::cin >> word1 && std::cin >> word2
&& !!std::getline(std::cin, line); // double-! necessary?
if (success) { std::cout << "GOOD NEWS!" << std::endl; }
else { std::cout << "bad news :(" << std::endl; }
return 0;
}
Alternatively, in such a string I would expect two spaces. My suggestion would be to use string::find to locate the first and second spaces like so:
size_t first_position = str1.find(' ', 0);
You should probably check this against string::npos as an opportunity to handle errors. Following that:
size_t second_position = str1.find(' ', first_position + 1);
Next error handling check and after that, it should then be trivial to use string::substr to split that string into sections like so:
string first_section = str1.substr(0 , first_position)
, second_section = str1.substr(first_position , second_position)
, third_section = str1.substr(second_position, string::npos);

I have this Utility class that has a bunch of methods for string manipulation. I will show the class function for splitting strings with a delimiter. This class has private constructor so you can not create an instance of this class. All the methods are static methods.
Utility.h
#ifndef UTILITY_H
#define UTILITY_h
// Library Includes Here: vector, string etc.
class Utility {
public:
static std::vector<std::string> splitString( const std::string& strStringToSplit,
const std::string& strDelimiter,
const bool keepEmpty = true );
private:
Utility();
};
Utility.cpp
std::vector<std::string> Utility::splitString( const std::string& strStringToSplit,
const std::string& strDelimiter,
const bool keepEmpty ) {
std::vector<std::string> vResult;
if ( strDelimiter.empty() ) {
vResult.push_back( strStringToSplit );
return vResult;
}
std::string::const_iterator itSubStrStart = strStringToSplit.begin(), itSubStrEnd;
while ( true ) {
itSubStrEnd = search( itSubStrStart, strStringToSplit.end(), strDelimiter.begin(), strDelimiter.end() );
std::string strTemp( itSubStrStart, itSubStrEnd );
if ( keepEmpty || !strTemp.empty() ) {
vResult.push_back( strTemp );
}
if ( itSubStrEnd == strStringToSplit.end() ) {
break;
}
itSubStrStart = itSubStrEnd + strDelimiter.size();
}
return vResult;
}
Main.cpp -- Usage
#include <string>
#include <vector>
#include "Utility.h"
int main() {
std::string myString( "Hello World How Are You Today" );
std::vector<std::string> vStrings = Utility::splitString( myString, " " );
// Check Vector Of Strings
for ( unsigned n = 0; n < vStrings.size(); ++n ) {
std::cout << vStrings[n] << " ";
}
std::cout << std::endl;
// The Delimiter is also not restricted to just a single character
std::string myString2( "Hello, World, How, Are, You, Today" );
// Clear Out Vector
vStrings.clear();
vStrings = Utility::splitString( myString2, ", " ); // Delimiter = Comma & Space
// Test Vector Again
for ( unsigned n = 0; n < vStrings.size(); ++n ) {
std::cout << vStrings[n] << " ";
}
std::cout << std::endl;
return 0;
}

Reading in a simple text file & writing it to another text file with leading blanks and blank lines removed

I want to keep this code but now I am just wondering if there is a way when i read in the file in my while loop if i can remove the blanks within that loop
I am having a ton of problems with removing blanks
I do not have a large understanding on reading in files
to my program so this has been very
difficult for me, can anybody tell me where
I am making my mistakes?
#include <iostream>
#include <cassert>
#include <string>
#include <fstream>
#include <cstdio>
using namespace std;
int main (void)
{
int i=0;
int current=0;
int len;
int ch;
string s1;
string s2;
ifstream fileIn;
cout << "Enter name of file: ";
cin >> s1;
fileIn.open(s1.data() );
assert(fileIn.is_open() );
while (!(fileIn.eof() ) )
{ ch=fileIn.get();
s1.insert(i,1,ch);
s1.end();
i++;}
cout << s1;
len=s1.length();
cout << len;
while (current < len-1)
{
if (!(s1[current] == ' ' && s1[current + 1] == ' ') &&
!(s1[current] == '\n' && s1[current + 1] == '\n')
)
{
s2.append(s1[current]);
}
current++;
}
return 0;
}

There are a number of things that I would do differently. Without going into details, here is what I propose; it requires C++11 (pass the -std=c++11 also to the compiler if you are using gcc or clang):
#include <algorithm>
#include <cctype>
#include <fstream>
#include <functional>
#include <iostream>
#include <locale>
using namespace std;
// trim from left
static string ltrim(string s) {
s.erase(s.begin(), find_if(s.begin(), s.end(), [](char c) { return !isblank(c); } ));
return s;
}
int main() {
string file_name;
cout << "Please enter the file name: " << flush;
cin >> file_name;
ifstream in(file_name);
if (!in.good()) {
cout << "Failed to open file \"" << file_name << "\"" << endl;
return 1;
}
string buffer;
while (getline(in, buffer)) {
buffer = ltrim(buffer);
if (!buffer.empty()) {
cout << buffer << '\n'; // <-- or write into a file as you need
}
}
return 0;
}
Now the title says you want to remove only the leading spaces but to my question you answered that you want to remove the trailing spaces as well from the end of the lines. If it is like that, use trim() instead of ltrim(). The necessary functions are:
// trim from left
static string ltrim(string s) {
s.erase(s.begin(), find_if(s.begin(), s.end(), [](char c) { return !isblank(c); } ));
return s;
}
// trim from right
static string rtrim(string s) {
s.erase(find_if(s.rbegin(), s.rend(), [](char c) { return !isblank(c); }).base(), s.end());
return s;
}
// trim from both left and right
static string trim(string s) {
return ltrim(rtrim(s));
}
There are other, most likely faster trim implementations. See, for example: What's the best way to trim std::string?

The standard library already has most of the functionality you want, so I'd do my best to rely on that to do most of the job.
Copying some data with a specified subset removed is what std::remove_copy_if is supposed to do, so I'd use it for the main loop:
std::remove_copy_if(std::istream_iterator<line>(std::cin),
std::istream_iterator<line>(),
std::ostream_iterator<std::string>(std::cout, "\n"),
[](std::string const &s){return s.empty(); });
So, given an appropriate definition of a line, that will copy lines with any empty ones removed.
Our next step is to define a line class that removes leading white-space when we extract one from a stream, and can be converted to a string. For that, I'd "cheat" a little. When we extract a character from a stream like mystream >> mychar;, it automatically skips any leading white-space. I'd use that by reading a char, then putting it back into the stream1, so I had the stream starting from the first non-whitespace character. Then I'd use getline to read the rest of the line.
1. Reading a character, then immediately putting it back into the stream is probably unusual enough to merit either a comment, or being put into a function with a descriptive name like skip_leading_blanks:
void skip_leading_blanks(std::istream &is){
char ch;
is >> ch;
is.putback(ch);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing a huge complicated CSV file using C++ - c++

Related

c++ split string by delimiter into char array

Parsing text file lines in C++

how to display text file in c++?

Split a sentence (string) at the spaces [duplicate]

Reading in a simple text file & writing it to another text file with leading blanks and blank lines removed

Categories

Resources