Read selected columns using fstream - c++

I am using the following codes to read my data from a file, but my issue is that I only want to catch some columns out of many more columns in the file. Is there any better way of doing this than the approach I am using.
void Data::read_simulated (const string &filepath)
{
ifstream data_out (filepath.c_str());
if (!data_out)
cout<<"Failed to open"<<endl;
else
{
string id_p,age_p, dim_p, my_p, mcf_p, mcp_p, mcl_p, bw_p, bcs_p;
string dummy_line, g;
getline(data_out, dummy_line);
while(data_out>>age_p>>g>>g>>g>>g>>g>>g>>g>>bcs_p>>g>>g>>my_p>>g>>g>>bw_p>>g>>g>>dim_p>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g>>g)
{
//s.cow_id.push_back(get_number(id_p));
if (get_number(age_p)>=1424.0 &&get_number(age_p)<=1733.0)
{
age_pre.push_back(age_p);
dim_pre.push_back(dim_p);
my_pre.push_back(my_p);
//mcf_obs.push_back(get_number(mcf_p));
// mcp_obs.push_back(get_number(mcp_p));
//mcl_obs.push_back(get_number(mcl_p));
bw_pre.push_back(bw_p);
bcs_pre.push_back(bcs_p);
}
}
data_out.close();
}
}

If the columns are aligned with spaces, each starting on a position in line being multiple of a column number and a constant, you could use std::istream::ignore or std::istream::seekg functions to skip some rows.
If that's not the case, at least make your code prettier by using this function:
std::istream &skip_row(std::istream &is, unsigned int count)
{
std::string s;
while(count-- && is >> s) {}
return is;
}
You could make it a template to accept various types, or you could overload an operator>> for a class to get a different syntax than this:
data_out >> age_p && skip_row(data_out, 5) && data_out >> bcs_p >> ...
A naive approach is to read all rows into a std::vector<std::string> and then index it, but it will have an impact on performance due to excessive memory allocation.

Related

Find number of items in .txt file

I'm looking for a way to find the number of items in a .txt file.
The file structure is as follows:
students.txt pricem 1441912123
house.pdf jatkins 1442000124
users.txt kevin_tomlinson 1442001032
accounts.mdb kevin_tomlinson 1442210121
vacation.jpg smitty83 1442300125
calendar.cpp burtons 1442588012
The result should be 18 in this example since there are 18 separate "words" in this file.
I need that value so I can iterate through the items and assign them to an array of structures (maybe there's a way to accomplish both of these steps together?):
// my structure
struct AccessRecord
{
string filename;
string username;
long timestamp;
};
// new instance of AccessRecord
// max possible records: 500
AccessRecord logRecords[500];
// while file has content
while (!fin.eof())
{
// loop through file until end
// max possible records: 500
for (int i = 0; i < 500; i++) // need to figure out how to iterate
{
fin >> logRecords[i].filename
>> logRecords[i].username
>> logRecords[i].timestamp;
}
}
Which will then be written to the screen.
So the question is, how do I find the count? Or is there a better way?
You know that each line contains a string, a string and a long, so you can iterate with:
std::vector<AccessRecord> logs;
std::string fname, uname;
long tstamp;
while(fin >> fname >> uname >> tstamp) {
logs.push_back(AccessRecord(fname, uname, tstamp));
//To avoid copies, use: (thanks #Rakete1111!)
//logs.emplace_back(std::move(fname), std::move(uname), tstamp);
}
This is assuming you've created a constructor for your struct like:
AccessRecord(std::string f, std::string u, long t)
: filename(f), username(u), timestamp(t) { }
Notice that I'm using an std::vector here instead of an array so that we don't even have to worry about the number of items, since the vector will resize itself dynamically!
You should overload operator>> for your structure:
struct AccessRecord
{
string filename;
string username;
long timestamp;
friend std::istream& operator>>(std::istream& input, AccessRecord& ar);
};
std::istream& operator>>(std::istream& input, AccessRecord& ar)
{
input >> ar.filename;
input >> ar.username;
input >> ar.timestamp;
return input;
}
This allows you to simplify your input function:
AccessRecord ar;
std::vector<AccessRecord> logs;
//...
while (fin >> ar)
{
database.push_back(ar);
}
Usually, if you are accessing an objects data members directly outside of the class or structure, something is wrong. Search the internet for "data hiding", "c++ encapsulation" and "c++ loose coupling".

How to read pieces of string into a class array C++

I have an array of dvd from a Video class I created
Video dvd[10];
each video has the property,
class Video {
string _title;
string _genre;
int _available;
int _holds;
public:
Video(string title, string genre, int available, int holds);
Video();
void print();
void read(istream & is, Video dvd);
int holds();
void restock(int num);
string getTitle();
~Video();
};
I'm trying to fill up this array with data from my text file where each info such as the title and genre is separated by a comma
Legend of the seeker, Fantasy/Adventure, 3, 2
Mindy Project, Comedy, 10, 3
Orange is the new black, Drama/Comedy, 10, 9
I've tried using getline(in, line, ',') but my brain halts when its time to insert each line into the dvd array.
I also created a read method to read each word separated by a whitespace but I figured thats not what I really want.
I also tried to read a line with getline, store the line in a string and split it from there but I get confused along the line.
**I can get the strings I need from each line, my confusion is in how to insert it into my class array in the while loop especially when I can only read one word at a time.
I need help on what approach I should follow to tackle this problem.
**My code
#include <iostream>
#include <fstream>
#include <cassert>
#include <vector>
#define MAX 10
using namespace std;
class Video {
string _title;
string _genre;
int _available;
int _holds;
public:
Video(string title, string genre, int available, int holds);
Video();
void print();
void read(istream & is, Video dvd);
int holds();
void restock(int num);
string getTitle();
~Video();
};
Video::Video(string title, string genre, int available, int holds){
_title = title;
_genre = genre;
_available = available;
_holds = holds;
}
void Video::read (istream & is, Video dvd)
{
is >> _title >> _genre >> _available>>_holds;
dvd = Video(_title,_genre,_available,_holds);
}
int Video::holds(){
return _holds;
}
void Video::restock(int num){
_available += 5;
}
string Video::getTitle(){
return _title;
}
Video::Video(){
}
void Video::print(){
cout<<"Video title: " <<_title<<"\n"<<
"Genre: "<<_genre<<"\n"<<
"Available: " <<_available<<"\n"<<
"Holds: " <<_holds<<endl;
}
Video::~Video(){
cout<<"DESTRUCTOR ACTIVATED"<<endl;
}
int main(int params, char **argv){
string line;
int index = 0;
vector<string> tokens;
//Video dvd = Video("23 Jump Street", "comedy", 10, 3);
//dvd.print();
Video dvd[MAX];
dvd[0].holds();
ifstream in("input.txt");
/*while (getline(in, line, ',')) {
tokens.push_back(line);
}
for (int i = 0; i < 40; ++i)
{
cout<<tokens[i]<<endl;
}*/
if(!in.fail()){
while (getline(in, line)) {
dvd[index].read(in, dvd[index]);
/*cout<<line<<endl;
token = line;
while (getline(line, token, ',')){
}
cout<<"LINE CUT#####"<<endl;
cout<<line<<endl;
cout<<"TOKEN CUT#####"<<endl;*/
//dvd[index] =
index++;
}
}else{
cout<<"Invalid file"<<endl;
}
for (int i = 0; i < MAX; ++i)
{
dvd[i].print();
}
}
First, I would change the Video::read function into an overload of operator >>. This will allow the Video class to be used as simply as any other type when an input stream is being used.
Also, the way you implemented read as a non-static member function returning a void is not intuitive and very clunky to use. How would you write the loop, and at the same time detect that you've reached the end of file (imagine if there are only 3 items to read -- how would you know to not try to read a fourth item)? The better, intuitive, and frankly, de-facto way to do this in C++ is to overload the >> operator.
(At the end, I show how to write a read function that uses the overloaded >>)
class Video
{
//...
public:
friend std::istream& operator >> (std::istream& is, Video& vid);
//..
};
I won't go over why this should be a friend function, as that can be easily researched here on how to overload >>.
So we need to implement this function. Here is an implementation that reads in a single line, and copies the information to the passed-in vid:
std::istream& operator >> (std::istream& is, Video& vid)
{
std::string line;
std::string theTitle, theGenre, theAvail, theHolds;
// First, we read the entire line
if (std::getline(is, line))
{
// Now we copy the line into a string stream and break
// down the individual items
std::istringstream iss(line);
// first item is the title, genre, available, and holds
std::getline(iss, theTitle, ',');
std::getline(iss, theGenre, ',');
std::getline(iss, theAvail, ',');
std::getline(iss, theHolds, ',');
// now we can create a Video and copy it to vid
vid = Video(theTitle, theGenre,
std::stoi(theAvail), // need to change to integer
std::stoi(theHolds)); // same here
}
return is; // return the input stream
}
Note how vid is a reference parameter, not passed by value. Your read function, if you were to keep it, would need to make the same change.
What we did above is that we read the entire line in first using the "outer" call to std::getline. Once we have the line as a string, we break down that string by using an std::istringstream and delimiting each item on the comma using an "inner" set of getline calls that works on the istringstream. Then we simply create a temporary Video from the information we retrieved from the istringstream and copy it to vid.
Here is a main function that now reads into a maximum of 10 items:
int main()
{
Video dvd[10];
int i = 0;
while (i < 10 && std::cin >> dvd[i])
{
dvd[i].print();
++i;
}
}
So if you look at the loop, all we did is 1) make sure we don't go over 10 items, and 2) just use cin >> dvd[i], which looks just like your everyday usage of >> when inputting an item. This is the magic of the overloaded >> for Video.
Here is a live example, using your data.
If you plan to keep the read function, then it would be easier if you changed the return type to bool that returns true if the item was read or false otherwise, and just calls the operator >>.
Here is an example:
bool Video::read(std::istream & is, Video& dvd)
{
if (is.good())
{
is >> dvd;
return true;
}
return false;
}
And here is the main function:
int main()
{
Video dvd[10];
int i = 0;
while (i < 10 && dvd[i].read(std::cin, dvd[i]))
{
dvd[i].print();
++i;
}
}
Live Example #2
However, I still say that the making of Video::read a non-static member makes the code in main clunky.

Remove entire rows with missing values c++

I am reading the data with different variables by the following codes, currently when the program touches missing values (represented in data by string "NA", it will change them to zero. Alternatively, I wonder if how can we remove entire rows when program touch "NA". I have tried to look for the same question but they all are for R, not C++. Please, if you can give me some advises. Thanks
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
struct Data {
vector<double> cow_id;
vector<double> age_obs;
vector<double> dim_obs;
vector<double> my_obs;
vector<double> mcf_obs;
vector<double> mcp_obs;
vector<double> mcl_obs;
vector<double> bw_obs;
vector<double> bcs_obs;
double get_number (string value)
{
if (value == "NA")
{return 0.0;}
else
{
istringstream iss (value);
double val;
iss>>val;
return val;
}
}
void read_input (const string filepath)
{
ifstream data_in (filepath.c_str());
if (!data_in)
{cout<<"Failed to open"<<endl;}
else
{
// Read tokens as strings.
string id, age, dim, my, mcf, mcp, mcl, bw, bcs;
string dummy_line;
getline(data_in, dummy_line);
string line;
while (data_in >> id >> age >> dim >> my >> mcf >> mcp >> mcl >> bw >> bcs)
{
// Get the number from the string and add to the vectors.
cow_id.push_back(get_number(id));
age_obs.push_back(get_number(age));
dim_obs.push_back(get_number(dim));
my_obs.push_back(get_number(my));
mcf_obs.push_back(get_number(mcf));
mcp_obs.push_back(get_number(mcp));
mcl_obs.push_back(get_number(mcl));
bw_obs.push_back(get_number(bw));
bcs_obs.push_back(get_number(bcs));
}
data_in.close();
}
size_t size=age_obs.size();
for (size_t i=0; i<size; i++)
{
cout<<cow_id[i]<<'\t'<<age_obs[i]<<'\t'<<dim_obs[i]<<'\t'<<my_obs[i] <<'\t'<<mcf_obs[i]<<'\t'<<mcp_obs[i]<<'\t'<<mcl_obs[i]<<'\t'<<bw_obs[i] <<'\t'<<bcs_obs[i]<<endl;
}
};
int main()
{
Data input;
input.read_input("C:\\Data\\C++\\learncpp\\data.txt");
}
Let's talk tables here.
Tables are containers of records (rows). The data you are capturing from your input file is already organized into records. So the obvious model is to use a structure that matches your file's data records.
struct Record
{
unsigned int cow_id;
unsigned int age_obs;
unsigned int dim_obs;
// ...
};
Your table could be represented as:
std::vector<record> my_table;
So to remove a record from the table, you can use the std::vector::erase() method. Easy. Also, you can use the std::find() function to search the table.
Let's relieve some reader's headaches with your present code by introducing a concept of the record loading its members from the file.
Reading a record from a file is best performed by overloading the stream extraction operator>>:
struct Record
{
//...
friend std::istream& operator>>(std::istream& input, Record& r);
};
std::istream&
operator>>(std::istream& input, Record& r)
{
std::string record_text;
std::getline(input, record_text);
// Extract a field from the record text and check for NA,
// Assign fields of r to those values:
r.cow_id = value;
// Etc.
return input;
}
With the overloaded operator, your input looks like:
Record r;
while (input_file >> r)
{
table.push_back(r);
}
Elegant and simple (reducing injection of defects).

Getting the nth line of a text file in C++

I need to read the nth line of a text file (e.g. textfile.findline(0) would find the first line of the text file loaded with ifstream textfile). Is this possible?
I don't need to put the contents of the file in an array/vector, I need to just assign a specific line of the text file to a varible (specifically a int).
P.S. I am looking for the simplest solution that would not require me to use any big external library (e.g. Boost)
Thanks in advance.
How about this?
std::string ReadNthLine(const std::string& filename, int N)
{
std::ifstream in(filename.c_str());
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
//skip N lines
for(int i = 0; i < N; ++i)
std::getline(in, s);
std::getline(in,s);
return s;
}
If you want to read the start of the nth line, you can use stdin::ignore to skip over the first n-1 lines, then read from the next line to assign to the variable.
template<typename T>
void readNthLine(istream& in, int n, T& value) {
for (int i = 0; i < n-1; ++i) {
in.ignore(numeric_limits<streamsize>::max(), '\n');
}
in >> value;
}
Armen's solution is the correct answer, but I thought I'd throw out an alternative, based on jweyrich's caching idea. For better or for worse, this reads in the entire file at construction, but only saves the newline positions (doesn't store the entire file, so it plays nice with massive files.) Then you can simply call ReadNthLine, and it will immediately jump to that line, and read in the one line you want. On the other hand, this is only optimal if you want to get only a fraction of the lines at a time, and the line numbers are not known at compile time.
class TextFile {
std::ifstream file_stream;
std::vector<std::ifstream::streampos> linebegins;
TextFile& operator=(TextFile& b) = delete;
public;
TextFile(std::string filename)
:file_stream(filename)
{
//this chunk stolen from Armen's,
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
while(file_stream) {
linebegins.push_back(file_stream.tellg());
std::getline(file_stream, s);
}
}
TextFile(TextFile&& b)
:file_stream(std::move(b.file_stream)),
:linebegins(std::move(b.linebegins))
{}
TextFile& operator=(TextFile&& b)
{
file_stream = std::move(b.file_stream);
linebegins = std::move(b.linebegins);
}
std::string ReadNthLine(int N) {
if (N >= linebegins.size()-1)
throw std::runtime_error("File doesn't have that many lines!");
std::string s;
// clear EOF and error flags
file_stream.clear();
file_stream.seekg(linebegins[N]);
std::getline(file_stream, s);
return s;
}
};
It's certainly possible. There are (n-1) '\n' characters preceding the nth line. Read lines until you reach the one you're looking for. You can do this on the fly without storing anything except the current line being considered.

reading and writing a vector of structs to file

I've read a few posts on Stack Overflow and a number of other site about writing vectors to files. I've implemented what I feel is working, but I'm having some troubles. One of the data members in the struct is a class string, and when reading the vector back in, that data is lost. Also, after writing the first iteration, additional iterations cause a malloc error. How can I modify the code below to achieve my desired ability to save the vector to a file, then read it back in when the program launches again? Currently, the read is done in the constructor, write in destructor, of a class who's only data member is the vector, but has methods to manipulate that vector.
Here is the gist of my read / write methods. Assuming vector<element> elements...
Read:
ifstream infile;
infile.open("data.dat", ios::in | ios::binary);
infile.seekg (0, ios::end);
elements.resize(infile.tellg()/sizeof(element));
infile.seekg (0, ios::beg);
infile.read( (char *) &elements[0], elements.capacity()*sizeof(element));
infile.close();
Write:
ofstream outfile;
outfile.open("data.dat", ios::out | ios::binary | ios_base::trunc);
elements.resize(elements.size());
outfile.write( (char *) &elements[0], elements.size() * sizeof(element));
outfile.close();
Struct element:
struct element {
int id;
string test;
int other;
};
In C++, memory can not generally be directly read and written to disk directly like that. In particular, your struct element contains a string, which is a non-POD data type, and therefore cannot be directly accessed.
A thought experiment might help clarify this. Your code assumes that all your element values are the same size. What would happen if one of the string test values was longer than what you've assumed? How would your code know what size to use when reading and writing to disk?
You will want to read about serialization for more information about how to handle this.
You code assumes all the relevant data exists directly inside the vector, whereas strings are fixed-sized objects that have pointers which can addres their variable sized content on the heap. You're basically saving the pointers and not the text. You should write a some string serialisation code, for example:
bool write_string(std::ostream& os, const std::string& s)
{
size_t n = s.size();
return os.write(n, sizeof n) && os.write(s.data(), n);
}
Then you can write serialisation routines for your struct. There are a few design options:
- many people like to declare Binary_IStream / Binary_OStream types that can house a std::ostream, but being a distinct type can be used to create a separate set of serialisation routines ala:
operator<<(Binary_OStream& os, const Some_Class&);
Or, you can just abandon the usual streaming notation when dealing with binary serialisation, and use function call notation instead. Obviously, it's nice to let the same code correctly output both binary serialisation and human-readable serialisation, so the operator-based approach is appealing.
If you serialise numbers, you need to decide whether to do so in a binary format or ASCII. With a pure binary format, where portable is required (even between 32-bit and 64-bit compiles on the same OS), you may need to make some effort to encode and use type size metadata (e.g. int32_t or int64_t?) as well as endianness (e.g. consider network byte order and ntohl()-family functions). With ASCII you can avoid some of those considerations, but it's variable length and can be slower to write/read. Below, I arbitrarily use ASCII with a '|' terminator for numbers.
bool write_element(std::ostream& os, const element& e)
{
return (os << e.id << '|') && write_string(os, e.test) && (os << e.other << '|');
}
And then for your vector:
os << elements.size() << '|';
for (std::vector<element>::const_iterator i = elements.begin();
i != elements.end(); ++i)
write_element(os, *i);
To read this back:
std::vector<element> elements;
size_t n;
if (is >> n)
for (int i = 0; i < n; ++i)
{
element e;
if (!read_element(is, e))
return false; // fail
elements.push_back(e);
}
...which needs...
bool read_element(std::istream& is, element& e)
{
char c;
return (is >> e.id >> c) && c == '|' &&
read_string(is, e.test) &&
(is >> e.other >> c) && c == '|';
}
...and...
bool read_string(std::istream& is, std::string& s)
{
size_t n;
char c;
if ((is >> n >> c) && c == '|')
{
s.resize(n);
return is.read(s.data(), n);
}
return false;
}