I'm currently writing a compiler front end for personal education on the topic and I've run into a problem concerning the way I handle BNF definition in C++ through operator overloading.
Currently my setup is as follows:
Rule.h:
class Rule
{
public:
ChainRule operator>>(Rule& right);
OrRule operator|(Rule& right);
KleeneRule operator*();
OptionalRule Rule::operator+();
virtual bool parse(TokenList::iterator& begin, TokenList::iterator end) = 0;
};
Rule.cpp:
ChainRule Rule::operator>>(Rule& right) {
return ChainRule(this, &right);
}
OrRule Rule::operator|(Rule& right) {
return OrRule(this, &right);
}
KleeneRule Rule::operator*() {
return KleeneRule(this);
}
OptionalRule Rule::operator+() {
return OptionalRule(this);
}
ChainRule, OrRule, KleeneRule, OptionalRule and EmptyRule are defined trivially like so:
class ChainRule : public Rule
{
private:
Rule* next;
Rule* _this;
public:
ChainRule();
ChainRule(Rule* _this, Rule* right);
bool parse(TokenList::iterator& begin, TokenList::iterator end) override;
};
Each subclass of Rule obviously defines a resonable implementation of parse(). Using these classes I can define my grammar as follows:
OrRule assignment_exp = logical_or_exp
| unary_exp >> StringRule("=") >> assignment_exp
;
Now here's the problem: Each overloaded operator returns a new object by value. This means that whenever I use operator>> or operator| from the Rule class, these pointers will be garbage once I return from the call to operator>> or operator| since the stack has been cleaned up and the objects are gone.
Neither can I use pass by value in the constructors for my Rule subclasses since this would not allow me to define recursive grammars.
So I'm left with no option of passing objects by value, and no option of passing objects by pointers. Can anyone point me to a solution that would not force me to define my grammar like so?
StringRule s = StringRule("=");
OrRule assignment_exp;
ChainRule temp1 = s >> assignment_exp;
ChainRule temp2 = unary_exp >> temp1;
assignment_exp = logical_or_exp | temp2;
P.S. I am aware of the various parser generators and Boost.Spirit, but my goal is to write my own parser.
You can allocate the return objects on the heap (via a factory) and return them as references. The factory can keep track of them so you won't leak. As far as syntax is concerned it will work the same as when you return them by value.
You can get around this problem by replacing your Rule* (which have the problem that you cannot overload operators for them) with wrapper objects. I.e. ChainRule would contain RuleRef next instead of Rule * next, etc., and all the operators would be defined for RuleRef. RuleRef would simply contain a Rule*, and be constructable from a Rule*. To make memory handling easier, you could inherit from a smart pointer class.
Related
In C++11, what is the best way to provide two versions of a method, one to modify the object itself and one to return a modified copy?
For example, consider a string class which has the "append(string)" method. Sometimes you might want to use append to modify your existing string object, sometimes you might want to keep your string object the same and create a copy.
Of course, I could just implement the first version and manually create a new object everytime I need one but that adds multiple temporary variables and lines of code to my project.
If it is still not clear what I am trying to do:
String s1("xy");
String s2 = s1.appendCopy("z");
s1.appendThis("w");
// s1 == "xyw"
// s2 == "xyz"
In Ruby there is a concept (or rather, a naming convention) which says for such methods, there are two variants: append (creates a new String) and append! (modifies this object)
C++ does not have something like this, so I would be stuck with ugly method names like "appendCopy".
Is there a good way to implement what I am trying to do?
So far, the best idea I had would be to make the modifying versions class members and the copying/immutable versions static methods which take the object to work on as a const argument.
There is actually a guideline, expressed by Herb Sutter in GotW #84:
Prefer non-member non-friend functions.
In your specific case, append (in-place) requires modifying the existing string so is well-suited to be a class-method, while append (copying) does not, so (following the guideline) should not be a class-method.
Thus:
void std::string::append(std::string const&);
inline std::string append(std::string left, std::string const& right) {
left.append(right);
return left;
}
After popular request, here are two overloads that can be used to optimize performance. First the member-version that may reuse its argument's buffer:
void std::string::append(std::string&& other) {
size_t const result_size = this->size() + other.size();
if (this->capacity() < result_size) {
if (other.capacity() >= result_size) {
swap(*this, other);
this->prepend(other);
return;
}
// grow buffer
}
// append
}
And second the free-function that may reuse its right-hand buffer:
inline std::string append(std::string const& left, std::string&& right) {
right.prepend(left);
return right;
}
Note: I am not exactly sure there are not ambiguous overloads manifesting. I believe there should not be...
With the new move semantics you can write:
class A{
public:
// this will get the property
const dataType& PropertyName() const { return m_Property; }
// this wil set the property
dataType& PropertyName() { return m_Propery; }
private:
dataType m_Propery;
};
main()
{
A a;
a.PropertyName() = someValueOfType_dataType; // set
someOtherValueOfType_dataType = a.PropertyName(); // get
}
I have a simple class that I am storing in a vector as pointers. I want to use a find on the vector but it is failing to find my object. Upon debugging it doesn't seem to call the == operator I've provided. I can 'see' the object in the debugger so I know its there. The code below even uses a copy of the first item in the list, but still fails. The only way I can make it pass is to use MergeLine* mlt = LineList.begin(), which shows me that it is comparing the objects and not using my equality operator at all.
class MergeLine {
public:
std::string linename;
int StartIndex;
double StartValue;
double FidStart;
int Length;
bool operator < (const MergeLine &ml) const {return FidStart < ml.FidStart;}
bool operator == (const MergeLine &ml) const {
return linename.compare( ml.linename) == 0;}
};
Class OtherClass{
public:
std::vector<MergeLine*>LineList;
std::vector<MergeLine*>::iterator LL_iter;
void DoSomething( std::string linename){
// this is the original version that returned LineList.end()
// MergeLine * mlt
// mlt->linename = linename;
// this version doesn't work either (I thought it would for sure!)
MergeLine *mlt =new MergeLine(*LineList.front());
LL_iter = std::find(LineList.begin(), LineList.end(), mlt);
if (LL_iter == LineList.end()) {
throw(Exception("line not found in LineList : " + mlt->linename));
}
MergeLine * ml = *LL_iter;
}
};
cheers,
Marc
Since your container contains pointers and not objects, the comparison will be between the pointers. The only way the pointers will be equal is when they point to the exact same object. As you've noticed the comparison operator for the objects themselves will never be called.
You can use std::find_if and pass it a comparison object to use.
class MergeLineCompare
{
MergeLine * m_p;
public:
MergeLineCompare(MergeLine * p) : m_p(p)
{
}
bool operator()(MergeLine * p)
{
return *p == *m_p;
}
};
LL_iter = std::find_if(LineList.begin(), LineList.end(), MergeLineCompare(mlt));
I think what you really want is to use std::find_if like this:
struct MergeLineNameCompare
{
std::string seachname;
MergeLineNameComp(const std::string &name) : seachname(name)
{
}
bool operator()(const MergeLine * line)
{
return seachname.compare( line->linename ) == 0;
}
};
LL_iter = std::find_if(LineList.begin(), LineList.end(), MergeLineNameCompare(linename) );
The operator == (no matter wich form) is better saved for real comparison of equality.
Operator overloading can't work with pointers as it is ambiguous.
Bjarne Stroustrup :-
References were introduced primarily to support operator overloading.
C passes every function argument by value, and where passing an object
by value would be inefficient or inappropriate the user can pass a
pointer. This strategy doesn’t work where operator overloading is
used. In that case, notational convenience is essential so that a user
cannot be expected to insert address− of operators if the objects are
large.
So, may be not best but still :-
std::vector<MergeLine>LineList;
std::vector<MergeLine>::iterator LL_iter;
It is said that the arrow operator is applied recursively. But when I try to execute the following code, it prints gibberish when it is supposed to print 4.
class dummy
{
public:
int *p;
int operator->()
{
return 4;
}
};
class screen
{
public:
dummy *p;
screen(dummy *pp): p(pp){}
dummy* operator->()
{
return p;
}
};
int main()
{
dummy *d = new dummy;
screen s(d);
cout<<s->p;
delete d;
}
What Stanley meant by “recursive” is just that the operator is applied to every returned object until the returned type is a pointer.
Which happens here on the first try: screen::operator -> returns a pointer. Thus this is the last call to an operator -> that the compiler attempts. It then resolves the right-hand sice of the operator (p) by looking up a member in the returned pointee type (dummy) with that name.
Essentially, whenever the compiler finds the syntax aᵢ->b in code, it essentially applies the following algorithm:
Is aᵢ of pointer type? If so, resolve member b of *aᵢ and call (*aᵢ).b.
Else, try to resolve aᵢ::operator ->
On success, set aᵢ₊₁ = aᵢ::operator ->(). Goto 1.
On failure, emit a compile error.
I’m hard-pressed to come up with a short, meaningful example where a chain of operator -> invocations even makes sense. Probably the only real use is when you write a smart pointer class.
However, the following toy example at least compiles and yields a number. But I wouldn’t advise actually writing such code. It breaks encapsulation and makes kittens cry.
#include <iostream>
struct size {
int width;
int height;
size() : width(640), height(480) { }
};
struct metrics {
size s;
size const* operator ->() const {
return &s;
}
};
struct screen {
metrics m;
metrics operator ->() const {
return m;
}
};
int main() {
screen s;
std::cout << s->width << "\n";
}
C++ Primer (5th edition) formulates it as follows on page 570:
The arrow operator never loses its fundamental meaning of member access. When we overload arrow, we change the object from which arrow fetches the specified member. We cannot change the fact that arrow fetches a member.
The deal is once screen::operator->() returns a pointer (dummy*) the recursion stops because built-in (default) -> in used on that pointer. If you want recursion you should return dummy or dummy& from screen::operator->()
I'm working on a problem which requires me to use the STL linked list class to represent a polynomials. I've made a good start on getting the class definition, however I'm a little confused as to where to go next (novice programmer - please excuse my potential ignorance).
class Polynomial
{
public:
Polynomial(); //Default constructor
Polynomial(pair<double,int>); //Specified constructor
void add(Polynomial);
Polynomial multiply(Polynomial);
void print();
private:
list<int> order_terms;
list<double> coeffs;
};
I have two questions:
1) It seems more elegant to store the terms and coefficients as a pair - however I'm unsure how to get that working using the STL list.
2) Regarding the add member function, I'm unsure how to implement it such that I can define a Polynomial and then add terms to it like this:
Polynomial test(pair<3.14,0>);
Polynomial test_2(pair<2,1>);
test.add(test_2);
The main thing I'm having issues with understanding how to access the terms stored in the other object and linking it to the first Polynomial.
Any help greatly appreciated.
EDIT: Code for the add() function - currently not working
void Polynomial::add(const Polynomial& rhs)
{
//Perform some sort of sort here to make sure both lists are correctly sorted
//Traverse the list of terms to see if there's an existing nth order
//term in the list on the left-hand-side polynomial.
list<int>::iterator itr;
list<int>::iterator itl;
for(itr=rhs->terms.begin(); itr!=rhs->terms.end(); itr++)
{
bool match=0;
//See if there's an existing terms, if so add to it
for(itl=terms.begin(); itl!=terms.end(); itl++)
{
if(*itl->second)==*itr->second)
{
*itl->first+=*itr->first;
match = 1;
}
}
//If not, this is the first nth order term so just push it onto the list
if(!match){ terms.push_back(*itr); //Perform the sort again }
}
To use a pair in a list you can do:
list<pair<double, int> > - note the space between the >. It's also nice to do something like
typedef pair<double, int> TermCoeff;
list<TermCoeff> equation;
To sort a list:
list<TermCoeff> equation;
// insert items
equation.sort(coeff_compare);
There are pre-defined comparator functions for a pair in the <utility> header. They compare the first elements and then the second ones if first is equal.
For your second question you should remember that an object of a class can access the member variables of an object of the same class, even if they are private. If you don't leave any gaps in your coefficients (in the constructor fill in missing ones with the second value of the pair set to 0) this means your add method can look like:
Polynomial& Polynomial::add(const Polynomial& rhs) {
// constructor should sort all terms and enforce that all terms are present
// lhs = current object (left hand side of operator)
// rhs = other object (right hand side of operator)
// example: lhs.add(rhs)
list<TermCoeff>::const_iterator rhs_iter = rhs.terms.begin();
list<TermCoeff>::iterator lhs_iter = terms.begin();
while(rhs_iter != rhs.terms.end()) {
if (lhs_iter != terms.end()) {
// add because we aren't at the end of current object's terms
lhs_iter->second += rhs_iter->second;
++lhs_iter;
} else {
// insert because we are at the end of current object's terms
terms.push_back(*rhs_iter);
lhs_iter = terms.end(); // keep the iterator at the end
}
++rhs_iter;
}
return *this;
}
int main (int argc, const char * argv[])
{
list<TermCoeff> first, second;
first.push_back(TermCoeff(0, 0.0)); // empty
first.push_back(TermCoeff(1, 5.0));
first.push_back(TermCoeff(2, 5.0));
second.push_back(TermCoeff(0, 6.0));
second.push_back(TermCoeff(1, 0.0)); // empty
second.push_back(TermCoeff(2, 8.0));
second.push_back(TermCoeff(3, 9.0));
Polynomial first_eq(first);
Polynomial second_eq(second);
first_eq.add(second_eq);
first_eq.print();
return 0;
}
Note that I returned a reference to the current object. This is a nice thing to do in an addition method because then you can chain additions:
first.add(second).add(third);
or
first.add(second.add(third));
Others have explained list<pair<double, int> > (and I like shelleybutterfly's suggestion to derive Polynomial from the list, except that I'd make it protected, not public, so that outside code is not too free to mess with the contents of the list).
But the add function is a little tricky, because adding two polynomials doesn't generally mean concatenating them or adding their terms together. The operation is actually more like merging-- and you'll soon see that the lists must be sorted. (In fact, it's more natural to represent polynomials as vectors, but I guess that's not the assignment.)
I suggest you implement Polynomial::add(pair<double, int>), first, then implement the other one (add(Polynomial &)) in terms of that.
I don't want to spell it out too much, since this looks like homework. Is this enough to point you in the right direction?
EDIT:
Your new code looks correct (albeit inefficient) if you fix a couple of bugs:
void Polynomial::add(const Polynomial& rhs)
{
// Don't forget to implement the sorting routine.
// The iterators must be of the correct type. And itr must be const,
// since you have declared rhs to be a const reference. The compiler
// will not allow you to have an iterator with the power to alter
// a const thing.
list<pair<double,int> >::const_iterator itr;
list<pair<double,int> >::iterator itl;
for(itr=rhs->terms.begin(); itr!=rhs->terms.end(); itr++)
{
bool match=false;
for(itl=terms.begin(); itl!=terms.end(); itl++)
{
// You have an extra parenthesis here, and too much dereferencing.
if(itl->second == itr->second)
{
itl->first+=itr->first;
match = true;
}
}
if(!match)
{ terms.push_back(*itr); //Perform the sort again
} // Be careful not to catch the closing brace in a comment
}
}
Once it is working, you can think about ways to make it cleaner and more efficient. For example, if you insert the new term in the right place, the list will always be in the right order and there will be no need for a sort routine.
As for using a pair, why not use a list<pair<double, int>> (list< pair<double, int> > for older compilers)? Or you could even define a separate class to hold your pair like so:
// example is implemented inline, you could always pull it out to
// your source file; although it's even possible that you could
// do everything inline if you want to allow just including a
// header but not having to link a separate file.
class CoeffAndTerm : public pair<double,int>
{
public:
// if you need it you should put extra functions here to
// provide abstractions:
double getTotalValue()
{
return first * second;
}
}
and then use
list<CoeffAndTerm> Polynomial;
as your variable, or even
// same stuff goes for this class RE: the inline function definitions
class Polynomial : public list<CoeffAndTerm>
{
public:
// same goes here for the abstraction stuff maybe things
// like in your current Polynomial class; assuming some
// functions here ...
Polynomial Multiply(Polynomial other)
{
Polynomial Result = new Polynomial();
for (int i=0; i < size(); ++i)
{
Result.addCoeffAndTerm(
new CoeffAndTerm(
other.first * first,
other.second * second
);
}
return Result;
}
}
so that you've got Polynomial being a derivation of the list itself. Not sure the exact usage of the Polynomial, so it's hard for me to speak to which makes more sense, but I like this way better as a general rule for a type such as this; seems to be that the polynomial "is a" list of coefficient and terms, it doesn't just "have" one. :) I'm sure that's debatable, and again it depends on the actual usage of your code.
for the operations, you could do reference returns, as in one of the other examples, but I have implemented the multiply without modifying the existing value, which you could also do for Add, Subtract, etc. so, assuming First, Second, Third, etc. are other polynomials
Polynomial Result = First.Multiply(Second).Add(Third).Subtract(Fourth);
you could also implement copy constructor, operator =, operator +, operator *, operator / and then do things that look like normal math:
Polynomial Result = First * Second + Third - Fourth;
While it's possible to use std::pair to group the term order and coefficient, I would recomment against it: it's not very readable - it's not clear what 'first' and 'second' means, and C++ will implicitly cast between numeric types - and you get no benefit from the added functionality of pair (ordering).
Instead, create a class like:
class Term {
double coeff_;
int exp_;
public:
Term(double coeff, int exp): coeff_(coeff), exp_(exp) {}
double coefficient() const { return coeff; }
int exponent() const { return exp; }
[...]
};
class Polynomial {
std::list<Term> terms;
[...]
Making fields public (e.g. by using struct or publicly deriving from pair) for performance reasons is not a good idea: inline constructor, getters and setters are just as fast as reading or writing the variable directly, and they have the advantage of encapsulating the implementation.
For that matter, you may want to create separate types to wrap polynomial coefficients and exponents themselves, in order to avoid mixing up numeric types, and performing nonsensical operations e.g.:
class Coefficient {
double val;
public:
explicit Coefficient(double value): val(value) {}
double getValue() { return val; }
double operator*(double rhs) { return val*rhs; }
Coefficient operator+(const Coefficient& rhs) {
return Coefficient(val+rhs.val);
}
[...]
};
etc.
Another possibility: instead of using a class, you could make as struct to represent the term and coefficient; you still can define methods on it just like a class, but the members are public by default which may make sense for efficiency reasons, especially if you're doing a lot of processing with these things. So, maybe:
struct CoeffAndTerm
{
int Term;
double Coeff;
private CoeffAndTerm(int parTerm, double parCoeff)
{
Term = parTerm;
Coeff = parCoeff;
}
public static CoeffAndTerm Make(int parTerm, double parCoeff)
{
return new CoeffAndTerm(parTerm, parCoeff);
}
// etc. and otherwise you can just do things as given in the example
// with the classes deriving from list<pair<int, double>>, e.g.,
// silly example again
public double getTotalValue()
{
return first * second;
}
}
and same applies otherwise as in the first example, again giving more direct access than that example had, but still allowing for the abstraction methods to be placed directly on the object
struct Polynomial : public list<CoeffAndTerm>
{
list<CoeffAndTerm> CoefficientsAndTerms;
Polynomial Multiply(Polynomial other)
{
Polynomial Result = new Polynomial();
for (int i=0; i < size(); ++i)
{
Result.addCoeffAndTerm(
new CoeffAndTerm(
other.first * first,
other.second * second
);
}
return Result;
}
// etc.
}
I have this:
typedef string domanin_name;
And then, I try to overload the operator< in this way:
bool operator<(const domain_name & left, const domain_name & right){
int pos_label_left = left.find_last_of('.');
int pos_label_right = right.find_last_of('.');
string label_left = left.substr(pos_label_left);
string label_right = right.substr(pos_label_right);
int last_pos_label_left=0, last_pos_label_right=0;
while(pos_label_left!=string::npos && pos_label_right!=string::npos){
if(label_left<label_right) return true;
else if(label_left>label_right) return false;
else{
last_pos_label_left = pos_label_left;
last_pos_label_right = pos_label_right;
pos_label_left = left.find_last_of('.', last_pos_label_left);
pos_label_right = right.find_last_of('.', last_pos_label_left);
label_left = left.substr(pos_label_left, last_pos_label_left);
label_right = right.substr(pos_label_right, last_pos_label_right);
}
}
}
I know it's a strange way to overload the operator <, but I have to do it this way. It should do what I want. That's not the point.
The problem is that it enter in an infinite loop right in this line:
if(label_left<label_right) return true;
It seems like it's trying to use this overloading function itself to do the comparision, but label_left is a string, not a domain name!
Any suggestion?
typedef just gives another name for a type. It does not create a distinct type. So in effect, you're overloading operator < for string.
If you want to create a distinct type, then you can try
struct domain_name {
string data;
// ...
};
and work with that.
Typedef doesn't work like this. Typedef simply defines an alias for the type - it is still a string. In order to do this, you would need a new type instead. You should do this anyway. Your operator is overloading the comparison operator for all strings.
Your typedef doesn't create a new type. It just creates a new name to refer to the same type as before. Thus, when you use < inside your operator function on two strings, the compiler just uses the same operator it's compiling because the argument types match.
What you may wish to do instead is define an entirely new function:
bool domain_less(domain_name const& left, domain_name const& right);
Then use that function in places that call for a comparison function, such as std::sort. Most of the standard algorithms will use < by default, but allow you to provide your own predicate function instead. You may need to use std::ptr_fun to wrap your function. You can also write your own functor object; it's typical to descend from std::binary_function in that case. (Check out the <functional> header.)