Overloading >> using istream - c++

So I am trying to overload the >> operator, but in this case I am getting a null terminated string in. How do I make the user only input enough characters that my dynamically allocated char[] named data and allocate it. I know there could be a way where I make a temp char[] with a size very big and use a for loop to copy them in, but I want to make it without making a very big char[]. I have this code for now but I know it doesn't work because of the length allowed in my class being passed in.
std::istream & operator>>(std::istream & is, String346 & objIn) {
using std::istream;
is >> objIn.data;
return is;
}

The C++ language contains no provision to technically bar the user to "only input enough characters" for your char array. There may be some operating system-specific resources available to you, such as limiting the maximum number of characters in a text entry field, but that's outside the scope of C++.
When reading from a std::istream, your code must be prepared to handle and deal with input that does not fit your criteria. Throw an exception, exit the program after printing an error message, or read up to the maximum number of characters you can accept and ignore the extra -- in whatever manner makes sense to you. It's entirely up to you.

std::istream::get() has an overload that allows you to limit size of the input. You still need to deal with the remaining input one way or another though.

Related

Is getline is as same as gets?

Okay so my question is simple..
We all know that how bad the gets is in C & hence the advice is to use fgets.
Now in C++ we use std::string s and std::getline(std::cin, s)..Now my question is that does getline() has the same boundary checking issue like gets()..
If yes then for char input[100] & cin.getline(input,sizeof(input)); will work for char array but while using string can I write this?
std::string s; & cin.getline(s, s.capacity()); ...is this appropriate or something else can I write??
No, getline does not have the same issues as gets. The function has a reference to the string, and so can call the string's size and capacity member functions for boundary checking purposes. However, it doesn't need to do that, because it also has access the string's resizing member functions, such as push_back, resize or operator+=, which will handle boundary checking automatically, reallocating when necessary.
get() leaves the delimiter in the queue thus letting you able to consider it as part of the next input. getline() discards it, so the next input will be just after it.
If you are talking about the newline character from a console input,it makes perfectly sense to discard it, but if we consider an input from a file, you can use as "delimiter" the beginning of the next field.

scanf on an istream object

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.
You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)
I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.
Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

What are the guidelines regarding parsing with iostreams?

I found myself writing a lot of parsing code lately (mostly custom formats, but it isn't really relevant).
To enhance reusability, I chose to base my parsing functions on i/o streams so that I can use them with things like boost::lexical_cast<>.
I however realized I have never read anywhere anything about how to do that properly.
To illustrate my question, lets consider I have three classes Foo, Bar and FooBar:
A Foo is represented by data in the following format: string(<number>, <number>).
A Bar is represented by data in the following format: string[<number>].
A FooBar is kind-of a variant type that can hold either a Foo or a Bar.
Now let's say I wrote an operator>>() for my Foo type:
istream& operator>>(istream& is, Foo& foo)
{
char c1, c2, c3;
is >> foo.m_string >> c1 >> foo.m_x >> c2 >> std::ws >> foo.m_y >> c3;
if ((c1 != '(') || (c2 != ',') || (c3 != ')'))
{
is.setstate(std::ios_base::failbit);
}
return is;
}
The parsing goes fine for valid data. But if the data is invalid:
foo might be partially modified;
Some data in the input stream was read and is thus no longer available to further calls to is.
Also, I wrote another operator>>() for my FooBar type:
istream& operator>>(istream& is, FooBar foobar)
{
Foo foo;
if (is >> foo)
{
foobar = foo;
}
else
{
is.clear();
Bar bar;
if (is >> bar)
{
foobar = bar;
}
}
return is;
}
But obviously it doesn't work because if is >> foo fails, some data has already been read and is no longer available for the call to is >> bar.
So here are my questions:
Where is my mistake here ?
Should one write his calls to operator>> to leave the initial data still available after a failure ? If so, how can I do that efficiently ?
If not, is there a way to "store" (and restore) the complete status of an input stream: state and data ?
What differences are they between failbit and badbit ? When should we use one or the other ?
Is there any online reference (or a book) that explains deeply how to deal with iostreams ? not just the basic stuff: the complete error handling.
Thank you very much.
Personally, I think these are reasonable questions and I remember very well that I struggled with them myself. So here we go:
Where is my mistake here ?
I wouldn't call it a mistake but you probably want to make sure you don't have to back off from what you have read. That is, I would implement three versions of the input functions. Depending on how complex the decoding of a specific type is I might not even share the code because it might be just a small piece anyway. If it is more than a line or two probably would share the code. That is, in your example I would have an extractor for FooBar which essentially reads the Foo or the Bar members and initializes objects correspondingly. Alternatively, I would read the leading part and then call a shared implementation extracting the common data.
Let's do this exercise because there are a few things which may be a complication. From your description of the format it isn't clear to me if the "string" and what follows the string are delimited e.g. by a whitespace (space, tab, etc.). If not, you can't just read a std::string: the default behavior for them is to read until the next whitespace. There are ways to tweak the stream into considering characters as whitespace (using std::ctype<char>) but I'll just assume that there is space. In this case, the extractor for Foo could look like this (note, all code is entirely untested):
std::istream& read_data(std::istream& is, Foo& foo, std::string& s) {
Foo tmp(s);
if (is >> get_char<'('> >> tmp.m_x >> get_char<','> >> tmp.m_y >> get_char<')'>)
std::swap(tmp, foo);
return is;
}
std::istream& operator>>(std::istream& is, Foo& foo)
{
std::string s;
return read_data(is >> s, foo, s);
}
The idea is that read_data() read the part of a Foo which is different from Bar when reading a FooBar. A similar approach would be used for Bar but I omit this. The more interesting bit is the use of this funny get_char() function template. This is something called a manipulator and is just a function taking a stream reference as argument and returning a stream reference. Since we have different characters we want to read and compare against, I made it a template but you can have one function per character as well. I'm just too lazy to type it out:
template <char Expect>
std::istream& get_char(std::istream& in) {
char c;
if (in >> c && c != 'e') {
in.set_state(std::ios_base::failbit);
}
return in;
}
What looks a bit weird about my code is that there are few checks if things worked. That is because the stream would just set std::ios_base::failbit when reading a member failed and I don't really have to bother myself. The only case where there is actually special logic added is in get_char() to deal with expecting a specific character. Similarly there is no skipping of whitespace characters (i.e. use of std::ws) going on: all the input functions are formatted input functions and these skip whitespace by default (you can turn this off by using e.g. in >> std::noskipws) but then lots of things won't work.
With a similar implementation for reading a Bar, reading a FooBar would look something like this:
std::istream& operator>> (std::istream& in, FooBar& foobar) {
std::string s;
if (in >> s) {
switch ((in >> std::ws).peek()) {
case '(': { Foo foo; read_data(in, foo, s); foobar = foo; break; }
case '[': { Bar bar; read_data(in, bar, s); foobar = bar; break; }
default: in.set_state(std::ios_base::failbit);
}
}
return in;
}
This code uses an unformatted input function, peek() which just looks at the next character. It either return the next character or it returns std::char_traits<char>::eof() if it fails. So, if there is either an opening parenthesis or an opening bracket we have read_data() take over. Otherwise we always fail. Solved the immediate problem. On to distributing information...
Should one write his calls to operator>> to leave the initial data still available after a failure ?
The general answer is: no. If you failed to read something went wrong and you give up. This might mean that you need to work harder to avoid failing, though. If you really need to back off from the position you were at to parse your data, you might want to read data first into a std::string using std::getline() and then analyze this string. Use of std::getline() assumes that there is a distinct character to stop at. The default is newline (hence the name) but you can use other characters as well:
std::getline(in, str, '!');
This would stop at the next exclamation mark and store all characters up to it in str. It would also extract the termination character but it wouldn't store it. This makes it interesting sometimes when you read the last line of a file which may not have a newline: std::getline() succeeds if it can read at least one character. If you need to know if the last character in a file is a newline, you can test if the stream reached:
if (std::getline(in, str) && in.eof()) { std::cout << "file not ending in newline\"; }
If so, how can I do that efficiently ?
Streams are by their very nature single pass: you receive each character just once and if you skip over one you consume it. Thus, you typically want to structure your data in a way such that you don't have to backtrack. That said, this isn't always possible and most streams actually have a buffer under the hood two which characters can be returned. Since streams can be implemented by a user there is no guarantee that characters can be returned. Even for the standard streams there isn't really a guarantee.
If you want to return a character, you have to put back exactly the character you extracted:
char c;
if (in >> c && c != 'a')
in.putback(c);
if (in >> c && c != 'b')
in.unget();
The latter function has slightly better performance because it doesn't have to check that the character is indeed the one which was extracted. It also has less chances to fail. Theoretically, you can put back as many characters as you want but most streams won't support more than a few in all cases: if there is a buffer, the standard library takes care of "ungetting" all characters until the start of the buffer is reached. If another character is returned, it calls the virtual function std::streambuf::pbackfail() which may or may not make more buffer space available. In the stream buffers I have implemented it will typically just fail, i.e. I typically don't override this function.
If not, is there a way to "store" (and restore) the complete status of an input stream: state and data ?
If you mean to entirely restore the state you were at, including the characters, the answer is: sure there is. ...but no easy way. For example, you could implement a filtering stream buffer and put back characters as described above to restore the sequence to be read (or support seeking or explicitly setting a mark in the stream). For some streams you can use seeking but not all streams support this. For example, std::cin typically doesn't support seeking.
Restoring the characters is only half the story, though. The other stuff you want to restore are the state flags and any formatting data. In fact, if the stream went into a failed or even bad state you need to clear the state flags before the stream will do most operations (although I think the formatting stuff can be reset anyway):
std::istream fmt(0); // doesn't have a default constructor: create an invalid stream
fmt.copyfmt(in); // safe the current format settings
// use in
in.copyfmt(fmt); // restore the original format settings
The function copyfmt() copies all fields associated with the stream which are related to formatting. These are:
the locale
the fmtflags
the information storage iword() and pword()
the stream's events
the exceptions
the streams's state
If you don't know about most of them don't worry: most stuff you probably won't care about. Well, until you need it but by then you have hopefully acquired some documentation and read about it (or ask and got a good response).
What differences are they between failbit and badbit ? When should we use one or the other ?
Finally a short and simple one:
failbit is set when formatting errors are detected, e.g. a number is expected but the character 'T' is found.
badbit is set when something goes wrong in the stream's infrastructure. For example, when the stream buffer isn't set (as in the stream fmt above) the stream has std::badbit set. The other reason is if an exception is thrown (and caught by way of the the exceptions() mask; by default all exceptions are caught).
Is there any online reference (or a book) that explains deeply how to deal with iostreams ? not just the basic stuff: the complete error handling.
Ah, yes, glad you asked. You probably want to get Nicolai Josuttis's "The C++ Standard Library". I know that this book describes all the details because I contributed to writing it. If you really want to know everything about IOStreams and locales you want Angelika Langer & Klaus Kreft's "IOStreams and Locales". In case you wonder where I got the information from originally: this was Steve Teale's "IOStreams" I don't know if this book is still in print and it lacking a lot of the stuff which was introduced during standardization. Since I implemented my own version of IOStreams (and locales) I know about the extensions as well, though.
So here are my questions:
Q: Where is my mistake here ?
I would not call your technique a mistake. It is absolutely fine.
When you read data from a stream you normally already know the objects coming off that stream (if the objects have multiple interpretations then that also needs to either be encoded into the stream (or you need to be able to rollback the stream).
Q: Should one write his calls to operator>> to leave the initial data still available after a failure?
Failure state should be there only if something really bad went wrong.
In your case if you are expecting a foobar (that has two representations) you have a choice:
Mark the type of object that is coming in the stream with some prefix data.
In the foobar parsing section use ftell() and fseek() to restore the stream position.
Try:
std::streampos point = stream.tellg();
if (is >> foo)
{
foobar = foo;
}
else
{
stream.seekg(point)
is.clear();
Q: If so, how can I do that efficiently ?
I prefer method 1 where you know the type on the stream.
Method two can used when this is unknowable.
Q: If not, is there a way to "store" (and restore) the complete status of an input stream: state and data ?
Yes but it requires two calls: see
std::iostate state = stream.rdstate()
std::istream holder;
holder.copyfmt(stream)
Q: What differences are they between failbit and badbit ?
From the documentation to the call fail():
failbit: is generally set by an input operation when the error was related to the internal logic of the operation itself, so other operations on the stream may be possible.
badbit: is generally set when the error involves the loss of integrity of the stream, which is likely to persist even if a different operation is performed on the stream. badbit can be checked independently by calling member function bad.
Q: When should we use one or the other ?
You should be setting failbit.
This means that your operation failed. If you know how it failed then you can reset and try again.
badbit is when you accidentally mash internal members of the stream or do something so bad that to the stream object itself is completely forked.
When you serialize your FooBar you should have a flag indicating which one it is, which will be the "header" for your write/read.
When you read it back, you read the flag then read in the appropriate datatype.
And yes, it is safest to read first into a temporary object then move the data. You can sometimes optimise this with a swap() function.

parse an unknown size string

I am trying to read an unknown size string from a text file and I used this code :
ifstream inp_file;
char line[1000] ;
inp_file.getline(line, 1000);
but I don't like it because it has a limit (even I know it's very hard to exceed this limit)but I want to implement a better code which reallocates according to the size of the coming string .
The following are some of the available options:
istream& getline ( istream& is, string& str, char delim );
istream& getline ( istream& is, string& str );
One of the usual idioms for reading unknown-size inputs is to read a chunk of known size inside a loop, check for the presence of more input (i.e. verify that you are not at the end of the line/file/region of interest), and extend the size of your buffer. While the getline primitives may be appropriate for you, this is a very general pattern for many tasks in languages where allocation of storage is left up to the programmer.
Maybe you could look at using re2c which is a flexible scanner for parsing the input stream? In that way you can pull in any sized input line without having to know in advance... for example using a regex notation
^.+$
once captured by re2c you can then determine how much memory to allocate...
Have a look on memory-mapped files in boost::iostreams.
Maybe it's too late to answer now, but just for documentation purposes, another way to read an unknown sized line would be to use a wrapper function. In this function, you use fgets() using a local buffer.
Set last character in the buffer to '\0'
Call fgets()
Check the last character and see if it's still '\0'
If it's not '\0' and it's not '\n', implies not finished reading a line yet. Allocate a new buffer and copy the data into this new buffer and go back to step (1) above.
If there is already an allocated buffer, call realloc() to make it bigger. Otherwise, you are done. Return the data in an allocated buffer.
This was a tip given in my algorithms lecture.

Overloading operator>> to a char buffer in C++ - can I tell the stream length?

I'm on a custom C++ crash course. I've known the basics for many years, but I'm currently trying to refresh my memory and learn more. To that end, as my second task (after writing a stack class based on linked lists), I'm writing my own string class.
It's gone pretty smoothly until now; I want to overload operator>> that I can do stuff like cin >> my_string;.
The problem is that I don't know how to read the istream properly (or perhaps the problem is that I don't know streams...). I tried a while (!stream.eof()) loop that .read()s 128 bytes at a time, but as one might expect, it stops only on EOF. I want it to read to a newline, like you get with cin >> to a std::string.
My string class has an alloc(size_t new_size) function that (re)allocates memory, and an append(const char *) function that does that part, but I obviously need to know the amount of memory to allocate before I can write to the buffer.
Any advice on how to implement this? I tried getting the istream length with seekg() and tellg(), to no avail (it returns -1), and as I said looping until EOF (doesn't stop reading at a newline) reading one chunk at a time.
To read characters from the stream until the end of line use a loop.
char c;
while(istr.get(c) && c != '\n')
{
// Apped 'c' to the end of your string.
}
// If you want to put the '\n' back onto the stream
// use istr.unget(c) here
// But I think its safe to say that dropping the '\n' is fine.
If you run out of room reallocate your buffer with a bigger size.
Copy the data across and continue. No need to be fancy for a learning project.
you can use cin::getline( buffer*, buffer_size);
then you will need to check for bad, eof and fail flags:
std::cin.bad(), std::cin.eof(), std::cin.fail()
unless bad or eof were set, fail flag being set usually indicates buffer overflow, so you should reallocate your buffer and continue reading into the new buffer after calling std::cin.clear()
A side note: In the STL the operator>> of an istream is overloaded to provide this kind of functionality or (as for *char ) are global functions. Maybe it would be more wise to provide a custom overload instead of overloading the operator in your class.
Check Jerry Coffin's answer to this question.
The first method he used is very simple (just a helper class) and allow you to write your input in a std::vector<std::string> where each element of the vector represents a line of the original input.
That really makes things easy when it comes to processing afterwards!