Reading comma separated data from string into struct, ignoring certain fields - c++

About the data:
Fields are separated by comma ','
It is always a certain number of fields (in my case 33)
Fields are always in the same order
The data is of different types, represented in ASCII. The types are known to me
Some fields are always empty, and some are always to be ignored (not useful data)
Example of data:
AnIdentifier,1,G,2014-10-01,11:00,,,3,7555,1,14535,1 .. etc
I am wondering what is the best way to read this data into a struct, in a C++ way. I used sscanf earlier, but I thought the format-parameter of the call got rather big and messy.
I also tried splitting the string into tokens in a vector, before setting the values in the struct one by one with values from the vector, but then I didn't have the nice automatic data conversion.
I saw an example of overloading the >> operator in the struct:
struct MyDataStruct
{
std::string Id;
char D;
char S;
std::string Date; // Preferably these could be combined
std::string Time; // to a time_t?
bool Running;
int Runtime;
...
...
friend std::istream &operator>>(std::istream &is, MyDataStruct &ds) {
return is >> Id >> D >> S >> Date >> Time >> Running >> Runtime ...
}
}
My C++ knowledge is limited (used to C#), so I started to struggle when there were fields I wanted to ignore while reading into the struct. Say there are 5 fields in is after reading into Time that I wish to skip before reading into Running.
And say I wished to represent the timestamp by a time_t instead of the two strings Date and Time (the way they are represented in the received data), could this be accomplished directly in the operator overload while reading into the struct?

Related

integers, chars and floating points in structs

So, I'm having some issues with my c++ code. I have the following code, but so far I can't get most of the data stored into the structured data type.
//structured data declaration
struct item
{
int itemCode;
char description[20];
float price;
};
And then the get code looks like this.
cout << setprecision(2) << fixed << showpoint;
ofstream salesFile ("Sales.txt");
ifstream stockFile ("Stock.txt");
for (counter = 0; counter < 9; counter++)
{
stockFile >> instock[counter].itemCode;
stockFile.getline (instock[counter].description, 20);
stockFile >> instock[counter].price;
}
The output should have looked like:
1234 "description here" 999.99
Quantity X
And this was the output:
1234 0.00
Quantity 5
If you have a file format that is of the form (for one entry)
1234
description here
999.99
(across multiple lines) then the explanation is simple
Th reading code in your loop, which does
stockFile >> instock[counter].itemCode;
stockFile.getline (instock[counter].description, 20);
stockFile >> instock[counter].price;
will work in this sequence
The value of instock[counter].itemCode will receive the value 1234. However (and this is important to understand) the newline after the 1234 will still be waiting in the stream to be read.
The call of getline() will encounter the newline, and return immediately. instock[counter].description will contain the string "".
The expression stockFile >> instock[counter].price will encounter the d in description. This cannot be interpreted as an integral value, so instock[counter].price will be unchanged.
Assuming some preceding code (which you haven't shown) sets instock[counter].price to 999.99 the above sequence of events will explain your output.
The real problem is that you are mixing styles of input on the one stream. In this case, mixing usage of streaming operators >> with use of line-oriented input (getline()). As per my description of the sequence above, different styles of input interact in different ways, because (as in this case) they behave differently when encountering a newline.
Some people will just tell you to skip over the newline after reading instock[counter].itemCode. That advice is flawed, since it doesn't cope well with changes (e.g. what happens if the file format changes to include an additional field on another line?, what happens if the file isn't "quite" in the expected format for some reason?).
The more general solution is to avoid mixing styles of input on the one stream. A common way would be to use getline() to read all data from the stream (i.e. not use >> to interact directly with stockFile). Then interpret/parse each string to find the information needed.
Incidentally, rather than using arrays of char to hold a string, try using the standard std::string (from standard header <string>). This has the advantage that std::string can adjust its length as needed. std::getline() also has an overload that can happily read to an std::string. Once data is read from your stream as an std::string, it can be interpreted as needed.
There are many ways of interpreting a string (e.g. to extract integral values from it). I'll leave finding an approach for that as an exercise - you will learn more by doing it yourself.

Read from a file in C++

4;
Spadina;76 156
Bathurst;121 291
Keele;70 61
Bay;158 158
This is what file contains in it. I need to read them and save them into variables.
4 is for dynamic memory allocation. 4 means there are 4 stations.
Spadina, Bathrust, etc.. they are station names. first number, which comes right after station names, is number of student passes and the second number is number of adult pass.
So, basically I have 4 variables and they are;
int numberOfStation;
int studentPass;
int adultPass;
string stationName;
I spent 4 hours but still cannot read the file and save it into variable
Thank you.
A possible solution is to read every line with e.g. std::getline then parse each such line string. You'll use the appropriate methods of std::string to search inside it (with find) and split it (with substr). You might also access some individual character in that string using at or the [] operator of std::string; alternatively, you might perhaps parse each line -or relevant parts of them- using std::istringstream (I am not sure it is appropriate in your case). You might be interested by std::to_string...
An important thing is to define exactly (not only thru examples) the possible acceptable inputs. You could for example use some EBNF to formalize that. You should probably care about character encoding (try first by assuming a simple single-byte encoding like ASCII, then later consider UTF-8 if your system uses it).
For example, can the station names (I guess you talk about subway stations) contain digits, or spaces, or underscores, or commas, etc.... ? Could they be French names like Hôtel de Ville (a metro station in central Paris) or Saint-Paul or Bourg-La-Reine (where I am), or Russian names like Молодёжная in Moscow? (I guess that station names in Tokyo or in Jerusalem might be even funnier to parse).
BTW, explicitly entering the number of entries (like your initial 4) is very user-unfriendly. You could have some lexical conventions and e.g. use some tagging or separators.
At last, you might want to keep the information for every travel. Then you'll probably need to define some struct or class (not simply four scalar variables). At that point your program is becoming more interesting!
First group your variables in struct.
struct MyStruct
{
int studentPass;
int adultPass;
string stationName;
};
Now read size of struct in file and allocate it dynamically
MyStruct *p;
s >> N;
p = new MyStruct[N];
Now in for loop you read string with delimiter ';' and other two vars are ints
for (int i = 0; i < N; i++)
{
getline(s, p[i].stationName, ';');
s >> p[i].studentPass >> p[i].adultPass;
}
Where var s is istream type of variable with flag std::in
I recommend that you create a struct to hold your stations:
struct station{
string _stationName;
int _studentPass;
int _adultPass;
};
Then create an operator to use with your struct (props to Sly_TheKing for the getline idea):
std::istream& operator>>(std::istream& is, station& rhs){
getline(is, rhs._stationName, ';');
is >> rhs._studentPass >> rhs._adultPass >> ws;
return is;
}
Say that your ifstream is called foo. You can read these into a vector like this:
foo.ignore(numeric_limits<streamsize>::max(), '\n');
vector<station> bar{istream_iterator<station>(foo), istream_iterator<station>()};

C++ Reading an multiline file with lines with arbitary lengths and format without using a stringstream

I have an input stream with the following lines:
# <int> <int>
<some_data_type> <some_data_type> <some_data_type> ..... <some_data_type>
<some_data_type_1> <some_data_type_2> <some_data_type_3> <some_data_type_1> <some_data_type_2> <some_data_type_3> .... <some_data_type_1> <some_data_type_2> <some_data_type_3>
In the above stream all three lines are different and have to be parsed differently. Currently,I am using a reading method as follows:
void reader( std::istream & is, DataStructure & d ){
std::string line;
getline(is,line);
std::stringstream s(line);
//parse line 1
getline(is,line);
std::stringstream line2(line);
//parse line 2
getline(is,line);
std::stringstream line3(line);
//parse line 3
}
Now the idea is not to make use of std::stringstream at all, as a line can arbitarily large and we donot want to load everything into memory twice. So, it would be better if it was possible to read from the input stream directly into the user given datastructure d.
An idea is to make use of std::istream_iterator but unfortunately the different lines have different parsing needs. For example, in the last line, three elements from the stream together constitute a single data element.
The only idea that seems plausible to me at this moment is to handle the stream buffer directly. It would be great if anyone could recommend a better way of doing this.
NOTE: Cannot make use of a tertiary data structure like std::stringstream. It is essential to read from the stream directly into the user provided data structure.
EDIT: Please note we are only allowed a single pass over the file.
Now the idea is not to make use of std::stringstream at all, as a line
can arbitarily large and we donot want to load everything into memory
twice. So, it would be better if it was possible to read from the
input stream directly into the user given datastructure d.
Olaf explained the extraction operator above but then we have a new requirement:
This will only work for the first line, where it is known there is a
fixed number of elements.
and
(2) Unfortunately, I have no discriminator beyond my knowledge that each instance of the data
structure needs to be instantiated with information stored in three
different lines. All three lines have different lengths and different
data elements. Also, I cannot change the format.
plus
(3) All information is treated as unsigned integer.
Now the next issue is that we don't know what the data structure actually is, so given what has come before it appears to be dynamic in some fashion. Because we can treat the data as unsigned int then we can use the extraction operator possibly, but read into a dynamic member:
vector<unsigned int> myUInts;
...
inFile >> currentUInt;
myUInts.push_back(currentUInt);
But then the issue of where to stop comes into play. Is it at the end of the first line, the third? If you need to read an arbitrary number of unsigned ints, whilst still checking for a new line then you will need to process white space as well:
inFile.unsetf(ios_base::skipws);
How you actually handle that is beyond what I can say at the moment without some clearer requirements. But I would guess it will be in the form:
inFile >> myMember;
char next = infile.peek()
//skip whitespace and check for new line
//Repeat until data structure filled, and repeat for each data structure.
Then do not use std::getline() at all. Define an istream operator for your types and use these directly
std::istream &operator >>(std::istream &f, DataStructure &d)
{
f >> d.member1 >> d.member2 >> ...;
return f;
}
void reader(std::istream & is, DataStructure &d)
{
is >> d;
}
There's no need fiddling with an std::istream_iterator or directly manipulating the stream buffer.

Is it possible to manipulate some text with an user-defined I/O manipulator?

Is there a (clean) way to manipulate some text from std::cin before inserting it into a std::string, so that the following would work:
cin >> setw(80) >> Uppercase >> mystring;
where mystring is std::string (I don't want to use any wrappers for strings).
Uppercase is a manipulator. I think it needs to act on the Chars in the buffer directly (no matter what is considered uppercase rather than lowercase now). Such a manipulator seems difficult to implement in a clean way, as user-defined manipulators, as far as I know, are used to just change or mix some pre-determined format flags easily.
(Non-extended) manipulators usually only set flags and data which the extractors afterwards read and react to. (That is what xalloc, iword, and pword are for.) What you could, obviously, do, is to write something analogous to std::get_money:
struct uppercasify {
uppercasify(std::string &s) : ref(s) {}
uppercasify(const uppercasify &other) : ref(other.ref) {}
std::string &ref;
}
std::istream &operator>>(std::istream &is, uppercasify uc) { // or &&uc in C++11
is >> uc.ref;
boost::to_upper(uc.ref);
return is;
}
cin >> setw(80) >> uppercasify(mystring);
Alternatively, cin >> uppercase could return not a reference to cin, but an instantiation of some (template) wrapper class uppercase_istream, with the corresponding overload for operator>>. I don't think having a manipulator modify the underlying stream buffer's contents is a good idea.
If you're desperate enough, I guess you could also imbue a hand-crafted locale resulting in uppercasing strings. I don't think I'd let anything like that go through a code review, though – it's simply just waiting to surprise and bite the next person working on the code.
You may want to check out boost iostreams. Its framework allows defining filters which can manipulate the stream. http://www.boost.org/doc/libs/1_49_0/libs/iostreams/doc/index.html

Read numbers and Convert them into doubles?

Okay, so i have a fairly annoying problem, one of the applications we use hdp, dumps HDF values to a text file.
So basically we have a text file consisting of this:
-8684 -8683 -8681 -8680 -8678 -8676 -8674 -8672 -8670 -8668 -8666
-8664 -8662 -8660 -8657 -8655 -8653 -8650 <trim... 62,000 more rows>
Each of these represent a double:
E.g.:
-8684 = -86.84
We know the values will be between 180 -> -180. But we also have to process around 65,000 rows of this. So time is kinda important.
Whats the best way to deal with this? (i can't use Boost or any of the other libraries, due to internal standards)
As you wish, as an answer instead... :)
Can't you just use standard iostream?
double val; cin >> &val; val/=100;
rinse, repeat 62000*11 times
I think I'd do the job a bit differently. I'd create a small sorta-proxy class to handle reading a value and converting it to a double:
class fixed_point {
double val;
public:
std::istream &read(std::istream &is) {
is >> val; val /= 100.0; return is;
}
operator double() { return val; }
friend std::istream &operator>>(std::istream &is, fixed_point &f) {
return f.read(is);
}
};
Using that, I could read my data a bit more cleanly. For example:
std::vector<double> my_data;
std::copy(std::istream_iterator<fixed_point>(infile),
std::istream_iterator<fixed_point>(),
std::back_inserter(my_data));
The important point here is that operator>> doesn't just read raw data, but extracts real information (in a format you can use) from that raw data.
There are other ways to do the job as well. For example, you could also create a derivative of std::num_get that parses doubles from that file's format. This is probably the way that's theoretically correct, but the documentation for this part of the library is mostly pretty poor, so I have a hard time advising it.