Overloading >> operator for dynamic c string class - c++

I need to overloading the cin >> operator for my c string class. I have overloaded the operator before but don't understand how to do this dynamically without having the size before hand to create the c string.
This is for homework and I must not use the string class. I also have to use dynamic allocation.
This is what I have so far... I know it's probably very poorly written, forgive me I'm a beginner.
istream& operator>> (istream& is, MyString& s1)
{
MyString temp;
int size = 0;
int i = 0;
int j = 0;
while (isspace(temp.data[i]) == true) {
is.get(temp.data[i]);
i++;
}
while (isspace(temp.data[j]) != true) {
size++;
temp.grow(size);
is >> temp.data[j];
j++;
}
return is;
}

Without seeing exactly how your MyString class is implemented, we can only speculate about how best to implement streaming into it, but typically you should implement your custom operator>> something like this:
istream& operator>> (istream& is, MyString& str)
{
istream::sentry s(is, false); // prepare the stream for input (flush output, skip leading whitespaces, error checking, etc)
if (s) // is the stream ready?
{
// clear str as needed
streamsize N = is.width();
if (N == 0) N = ... ; // set to max size of str, or numeric_limits<size_t>::max()
char ch;
while (is.get(ch)) // while not EOF or failure
{
// append ch to str, growing its capacity as needed
if (--N == 0) break; // max width reached?
if (!is.peek(ch)) break; // EOF reached?
if (isspace(ch, is.getloc()) break; // trailing whitespace detected?
}
}
is.width(0); // reset effect of std::setw()
return is;
}
The STL's built-in operator>> implementation for std::string is a little bit more complicated (use of traits and facets, direct access to the istream read buffer, etc), but this is the jist of it, based on the following information from CppReference.com:
operator<<,>>(std::basic_string)
template <class CharT, class Traits, class Allocator>
std::basic_istream<CharT, Traits>&
operator>>(std::basic_istream<CharT, Traits>& is,
std::basic_string<CharT, Traits, Allocator>& str);
Behaves as an FormattedInputFunction. After constructing and checking the sentry object, which may skip leading whitespace, first clears str with str.erase(), then reads characters from is and appends them to str as if by str.append(1, c), until one of the following conditions becomes true:
- N characters are read, where N is is.width() if is.width() > 0, otherwise N is str.max_size()
- the end-of-file condition occurs in the stream is
- std::isspace(c,is.getloc()) is true for the next character c in is (this whitespace character remains in the input stream).
If no characters are extracted then std::ios::failbit is set on is, which may throw std::ios_base::failure.
Finally, calls os.width(0) to cancel the effects of std::setw, if any.

Related

Reading a iostream until a string delimiter is found

I currently have a function that reads from a stream until a predefined stream-stopper is found. The only way I could currently get it up and running is by using std::getline and having one character followed by a newline (in my case char(3)) as my stream-stopper.
std::string readuntil(std::istream& in) {
std::string text;
std::getline(in, text, char(3));
return text;
}
Is there any way to achieve the same but with a larger string as my stream-stopper? I don't mind it having to be followed by a new-line, but I want my delimiter to be a random string of some size so that the probability of it occurring by change in the stream is very very low.
Any idea how to achieve this?
I assume that your requirements are:
a function taking an istream ref and a string as parameter
the string is a delimiter and the function must return a string containing all the characters that arrived before it
the stream must be positioned immediately after the delimiter for further processing.
AFAIK, neither the C++ nor the C standard library contain a function for that. I would just:
read until the last character of the delimiter in a temporary string
accumulate that in a global string
iterate the 2 above actions if the global string does not end with the delimiter
optionaly remove the delimiter from the end of the global string
return the global string
A possible C++ implementation is:
std::string readuntil(std::istream& in, std::string delimiter) {
std::string cr;
char delim = *(delimiter.rbegin());
size_t sz = delimiter.size(), tot;
do {
std::string temp;
std::getline(in, temp, delim);
cr += temp + delim;
tot = cr.size();
} while ((tot < sz) || (cr.substr(tot - sz, sz) != delimiter));
return cr.substr(0, tot - sz); // or return cr; if you want to keep the delimiter
}

How to cleanly extract a string delimited string from an istream in c++

I am trying to extract a string from an istream with strings as delimiters, yet i haven't found any string operations with behavior close to such as find() or substr() in istreams.
Here is an example istream content:
delim_oneFUUBARdelim_two
and my goal is to get FUUBAR into a string with as little workarounds as possible.
My current solution was to copy all istream content into a string using this solution for it and then extracting using string operations. Is there a way to avoid this unnecessary copying and only read as much from the istream as needed to preserve all content after the delimited string in case there are more to be found in similar fashion?
You can easily create a type that will consume the expected separator or delimiter:
struct Text
{
std::string t_;
};
std::istream& operator>>(std::istream& is, Text& t)
{
is >> std::skipws;
for (char c: t.t_)
{
if (is.peek() != c)
{
is.setstate(std::ios::failbit);
break;
}
is.get(); // throw away known-matching char
}
return is;
}
See it in action on ideone
This suffices when the previous stream extraction naturally stops without consuming the delimiter (e.g. an int extraction followed by a delimiter that doesn't start with a digit), which will typically be the case unless the previous extraction is of a std::string. Single-character delimiters can be specified to getline, but say your delimiter is "</block>" and the stream contains "<black>metalic</black></block>42" - you'd want something to extract "<black>metallic</black>" into a string, throw away the "</block>" delimiter, and leave the "42" on the stream:
struct Until_Delim {
Until_Delim(std::string& s, std::string delim) : s_(s), delim_(delim) { }
std::string& s_;
std::string delim_;
};
std::istream& operator>>(std::istream& is, const Until_Delim& ud)
{
std::istream::sentry sentry(is);
size_t in_delim = 0;
for (char c = is.get(); is; c = is.get())
{
if (c == ud.delim_[in_delim])
{
if (++in_delim == ud.delim_.size())
break;
continue;
}
if (in_delim) // was part-way into delimiter match...
{
ud.s_.append(ud.delim_, 0, in_delim);
in_delim = 0;
}
ud.s_ += c;
}
// may need to trim trailing whitespace...
if (is.flags() & std::ios_base::skipws)
while (!ud.s_.empty() && std::isspace(ud.s_.back()))
ud.s_.pop_back();
return is;
}
This can then be used as in:
string a_string;
if (some_stream >> Until_Delim(a_string, "</block>") >> whatevers_after)
...
This notation might seem a bit hackish, but there's precedent in Standard Library's std::quoted().
You can see the code running here.
Standard streams are equipped with locales that can do classification, namely the std::ctype<> facet. We can use this facet to ignore() characters in a stream while a certain classification is not present in the next available character. Here's a working example:
#include <iostream>
#include <sstream>
using mask = std::ctype_base::mask;
template<mask m>
void scan_classification(std::istream& is)
{
auto& ctype = std::use_facet<std::ctype<char>>(is.getloc());
while (is.peek() != std::char_traits<char>::eof() && !ctype.is(m, is.peek()))
is.ignore();
}
int main()
{
std::istringstream iss("some_string_delimiter3.1415another_string");
double d;
scan_classification<std::ctype_base::digit>(iss);
if (iss >> d)
std::cout << std::to_string(d); // "3.1415"
}

Writing stream `operator>>` for serial number?

I have a serial number class of the following form:
class SerialNumber { ... }
and I want to write the operator>> for it:
istream& operator>>(istream& i, SerialNumber& s)
{
???
return i;
}
The serial numbers are always 19 characters long and start with a hex digit.
I am confused if I should istream.read 19 characters. It may include prefix whitespace. ?
Or whether I should read a i >> std::string and then check that it is 19 characters long. When you read a std::string it skips whitespace (is there a standard way to implement that?) Further if I read a std::string it may have a valid 19 character serial number prefix, and I may have "over-read" the input. ?
Update:
inline istream& operator>>(istream& is, SerialNumber& id)
{
ostringstream os;
is >> ws;
for (int i = 0; i < 19; i++)
{
char c;
is >> c;
os << c;
}
id = DecodeId(os.str());
return is;
}
Partially sanitized version of Dietmar Kühl code:
istream& operator>> (istream& in, SerialNumber& sn)
{
constexpr size_t n = 19;
istream::sentry se(in);
if (!se)
return in;
istreambuf_iterator<char> it(in.rdbuf()), end;
if (it == end || !isxdigit(*it))
{
in.setstate(ios_base::failbit);
return in;
}
string s(n,'?');
for (size_t i = 0; it != end && i < n && !isspace(char(*it)), ++i)
s[i] = *it++;
sn = DecodeId(s);
if (failed to decode)
in.setstate(ios_base::failbit);
return in;
}
The standard formatted input functions always follow the same pattern:
They start off with constructing a std::sentry object which handles any skipping of leading whitespace depending on the setting of the std::ios_base::skipws formatting flag.
The read value is unchanged if reading the value fails in any way and std::ios_base::failbit gets set.
Characters are consumed up to the first character which fails to match the format.
That is, the input function would look something like that:
std::istream& operator>> (std::istream& in, SerialNumber& s) {
std::istream::sentry kerberos(in);
if (kerberos) {
std::istreambuf_iterator<char> it(in.rdbuf()), end;
char buffer[20] = {};
int i(0);
if (it != end && std::isxdigit(static_cast<unsigned char>(*it))) {
for (; it != end && i != 19
&& !std::isspace(static_cast<unsigned char>(*it)); ++i) {
buffer[i] = *it++;
}
}
if (i == 19) {
SerialNumber(buffer).swap(s);
}
else {
in.setstate(std::ios_base::failbit);
}
}
return in;
}
You should do it one step at a time:
If you want to always skip whitespace, then start by doing i >> std::ws. The stream may not have the skipws flag set. Otherwise let the user decide whether to skip whitespace or not, and set the stream error bit when reading a whitespace.
Read the first char, see if its an hexadecimal digit. If its not, then set the stream error bit.
Read the rest of the 18 characters, and as soon as you find a character that does not meet the serial number format set the stream error bit.
You should disable skipws for this, otherwise you will get valid results from characters separated by whitespace. If you do, then make sure to restore the skipws flag when exiting the function (which may happen via an exception when setting the error bit, if exceptions are enabled on the stream).

reading and writing a vector of structs to file

I've read a few posts on Stack Overflow and a number of other site about writing vectors to files. I've implemented what I feel is working, but I'm having some troubles. One of the data members in the struct is a class string, and when reading the vector back in, that data is lost. Also, after writing the first iteration, additional iterations cause a malloc error. How can I modify the code below to achieve my desired ability to save the vector to a file, then read it back in when the program launches again? Currently, the read is done in the constructor, write in destructor, of a class who's only data member is the vector, but has methods to manipulate that vector.
Here is the gist of my read / write methods. Assuming vector<element> elements...
Read:
ifstream infile;
infile.open("data.dat", ios::in | ios::binary);
infile.seekg (0, ios::end);
elements.resize(infile.tellg()/sizeof(element));
infile.seekg (0, ios::beg);
infile.read( (char *) &elements[0], elements.capacity()*sizeof(element));
infile.close();
Write:
ofstream outfile;
outfile.open("data.dat", ios::out | ios::binary | ios_base::trunc);
elements.resize(elements.size());
outfile.write( (char *) &elements[0], elements.size() * sizeof(element));
outfile.close();
Struct element:
struct element {
int id;
string test;
int other;
};
In C++, memory can not generally be directly read and written to disk directly like that. In particular, your struct element contains a string, which is a non-POD data type, and therefore cannot be directly accessed.
A thought experiment might help clarify this. Your code assumes that all your element values are the same size. What would happen if one of the string test values was longer than what you've assumed? How would your code know what size to use when reading and writing to disk?
You will want to read about serialization for more information about how to handle this.
You code assumes all the relevant data exists directly inside the vector, whereas strings are fixed-sized objects that have pointers which can addres their variable sized content on the heap. You're basically saving the pointers and not the text. You should write a some string serialisation code, for example:
bool write_string(std::ostream& os, const std::string& s)
{
size_t n = s.size();
return os.write(n, sizeof n) && os.write(s.data(), n);
}
Then you can write serialisation routines for your struct. There are a few design options:
- many people like to declare Binary_IStream / Binary_OStream types that can house a std::ostream, but being a distinct type can be used to create a separate set of serialisation routines ala:
operator<<(Binary_OStream& os, const Some_Class&);
Or, you can just abandon the usual streaming notation when dealing with binary serialisation, and use function call notation instead. Obviously, it's nice to let the same code correctly output both binary serialisation and human-readable serialisation, so the operator-based approach is appealing.
If you serialise numbers, you need to decide whether to do so in a binary format or ASCII. With a pure binary format, where portable is required (even between 32-bit and 64-bit compiles on the same OS), you may need to make some effort to encode and use type size metadata (e.g. int32_t or int64_t?) as well as endianness (e.g. consider network byte order and ntohl()-family functions). With ASCII you can avoid some of those considerations, but it's variable length and can be slower to write/read. Below, I arbitrarily use ASCII with a '|' terminator for numbers.
bool write_element(std::ostream& os, const element& e)
{
return (os << e.id << '|') && write_string(os, e.test) && (os << e.other << '|');
}
And then for your vector:
os << elements.size() << '|';
for (std::vector<element>::const_iterator i = elements.begin();
i != elements.end(); ++i)
write_element(os, *i);
To read this back:
std::vector<element> elements;
size_t n;
if (is >> n)
for (int i = 0; i < n; ++i)
{
element e;
if (!read_element(is, e))
return false; // fail
elements.push_back(e);
}
...which needs...
bool read_element(std::istream& is, element& e)
{
char c;
return (is >> e.id >> c) && c == '|' &&
read_string(is, e.test) &&
(is >> e.other >> c) && c == '|';
}
...and...
bool read_string(std::istream& is, std::string& s)
{
size_t n;
char c;
if ((is >> n >> c) && c == '|')
{
s.resize(n);
return is.read(s.data(), n);
}
return false;
}

Overriding operator>> for a Strings class

I just have a quick question. I need to override the operator >> for a custom String class and I can't quite figure out how to do it.
I know that this code works, because it was my original method of solving the problem:
istream& operator>>(istream &is, String &s) {
char data[ String::BUFF_INC ]; //BUFF_INC is predefined
is >> data;
delete &s;
s = data;
return s;
}
However, according to the spec (this is a homework assignment), I need to read in the characters 1 at a time to manually check for whitespace and ensure that the string isn't too big for data[]. So I changed my code to the following:
istream& operator>>(istream &is, String &s) {
char data[ String::BUFF_INC ];
int idx = 0;
data[ 0 ] = is.get();
while( (data[ idx ] != *String::WHITESPACE) && !is.ios::fail() ) {
++idx;
is.get();
data[ idx ] = s[ idx ];
}
return is;
}
When this new code is executed however it just gets stuck in a loop of user input. So how do I use is.get() to read in the data character by character but not wait for more user input? Or should I perhaps be using something other than .get()?
You don't seem to be doing anything with the character you get from the stream
istream& operator>>(istream &is, String &s) {
char data[ String::BUFF_INC ];
int idx = 0;
data[ 0 ] = is.get();
while( (data[ idx ] != *String::WHITESPACE) && !is.ios::fail() ) {
++idx;
is.get(); // you don't do anything with this
data[ idx ] = s[ idx ]; // you're copying the string into the buffer
}
return is;
}
So it checks whether the string s contains a whitespace, not whether you read a whitespace from the stream.
Try:
istream& operator>>(istream &is, String &s)
{
std::string buffer;
is >> buffer; // This reads 1 white space separated word.
s.data = buffer.c_str();
return is;
}
Commenting on your original code:
istream& operator>>(istream &is, String &s)
{
char data[ String::BUFF_INC ];
is >> data; // Will work. But prone to buffer overflow.
delete s; // This line is definately wrong.
// s is not a pointer so I don;t know what deleting it would do.
s = data; // Assume assignment operator is defined.
// for your class that accepts a C-String
return s;
}
Using the second version as a base:
istream& operator>>(istream &is, String &s)
{
std::vector<char> data;
char first;
// Must ignore all the white space before the word
for(first = is.get(); String::isWhiteSpace(first) && is; first = is.get())
{}
// If we fond a non space first character
if (is && !String::isWhiteSpace(first))
{
data.push_back(first);
}
// Now get values while white space is false
char next;
while( !String::isWhiteSpace(next = is.get()) && is)
{
// Note we test the condition of the stream in the loop
// This is because is.get() may fail (with eof() or bad()
// So we test it after each get.
//
// Normally you would use >> operator but that ignores spaces.
data.push_back(next);
}
// Now assign it to your String object
data.push_back('\0');
s.data = data;
return is;
}