C++ Design: Base class constructors and is-a or has-a relationship for custom string class - c++

I'm learning C++ and want to implement a custom string class, MyTextProc::Word, to add some features to std::string, such as string reversal, case conversion, translations etc.
It seems that this is best done using an is-a relationship:
namespace MyTextProc {
class Word : public string {
/* my custom methods.... */
};
}
I do not specify any constructors for my class but the above definition of the Word class only exposes default void and copy constructors - cant Word just inherit all the public string constructors as well?
It would be good to have Word work just like a string. I am adding no properties to string; must I really implement every single constructor of string, the base class, in order to implement my new subclass?
Is this best done using a has-a relationship? Should I just implement a const string& constructor and require clients to pass a string object for construction? Should I override all of the string constructors?

Welcomne to the C++ hell.
You've just touched one of the most controversial aspect of C++: std::string is not polymorphic and constructors are not inherited.
The only "clean" way (that will not make any sort of criticism) is embed std::string as a member, delegating ALL OF ITS METHODS. Good work!
Other ways can come around, but you have always to take care some limitations.
std::string has no virtual methods, so if you derive it, you will not get a polymorphic type.
That means that if yoy pass a Word to a sting keeping function, and that function calls a string method, your override method will not be called and
whatever allocation of Word via new must not be given to a string*: deleting via such pointer will result in undefined behavior
All the inherited method that take string and return string-s will work, but they'll return string, not Word.
About constructors, they are NOT INHERITED. The default construction inheritance is an illusion, due to the compiler synthesized default implementation of default, copy and assign, that call implicitly the base.
In C++11 a workaround can be
class Word: public std::string
{
public:
template<class... Args>
Word(Args&&... args) :std::string(std::forward<Args>(args)...)
{}
//whatever else
};
This makes whatever kind of arguments to be given in a call to a suitable std::sting ctor (if it exist, otherwise a compile error happens).
Now, decide yourself what the design should be. May be you will come with normal std::string and an independent set of free functions.
Another (imperfect) way can be make Word not inheriting, but embedding std::string, constructed as above, and implicitly convertible into std::string. (plus having a str() explicit method). This let you able to use a Word as a string, create a Word from a string, but not use a Word "in place of" a string.
Another thing to unlearn (may be from Java ...): don't fix yourself to "is-a = inheritance and has-a = embedding" OOP rule. All C++ standard library objects are not Objects in OOP sense, so all the OOP school methodologies have fallacies in that context.
You have to decide in you case what is the trade-off between simple coding (and good application of the "don't repeat yourself" paradigm, much easier with imheritance) or simple maintenance (and embedding let your code less prone to be used wrongly by others)
This is in answer t othe comment below:
"the lack of polymorphism of standard C++ classes. Why is this so? It seems to a novice like me that not implementing std C++ libs using virtual functions is defeating the point of the language they are designed to enrich!!!"
Well ... yes and not!
Since you cite PERL, consider that
- PERL is a scripting language, where types are dynamic.
- Java is a language where types are static and objects dynamic
- C++ is a language where types are static and object are static (and dynamic allocation of object is explicit)
Now, in Java objects are always dynamically allocated and local variables are "reference" to those objects.
In C++, local variables are object themselves, and have value semantics. And the C++ standard library is designed not as a set of bases to extend, but as a set of value types for which generate code by means of templates.
Think to an std::string as something working just like int works: do you expect to derive from int to get "more methods" or to change the behavior of some of them?
The controversial aspect, here, is that -to be coherent with this design- std::string should just manage its internal memory, but should have no methods. Istead, string functions shold have been implemented as templates, so that they can be used as "algorithm" with whatever other class exhibiting a same std::string external behavior. Something the designers didn't.
They placed many methods on it, but they din't make it polymorphic to still retain the value semantics, thus making and ambiguous design, and retaining to inheritance the only way to "reuse" those methods without re-declaring them. This is possible, but with the limitation I told you.
If yo uwant to effectively create new function, to have "polymorphism on value", use teplates: intead of
std::string capitalize(const std::string& s) { .... }
do something like
template<class String>
String capitalize(const String& s) { .... }
So that you code can work with whatever class having the same string interface respect to characters, for whatever type of characters.

As honest advise, I'd implement the methods you want as functions which take in a string and return a string. They'll be easier to test, decoupled, and easy to use. When in C++, don't always reach for a class when a function would do. In fact, when you get into templates, you could create a templated function without definition and a specialization for the basic string class. That way, you always will know if the string type you're touching has a custom defined method (and yes, if you interact with Microsoft you'll discover there's 50 million string implementations.)

Related

C++ Class References

Coming from Delphi, I'm used to using class references (metaclasses) like this:
type
TClass = class of TForm;
var
x: TClass;
f: TForm;
begin
x := TForm;
f := x.Create();
f.ShowModal();
f.Free;
end;
Actually, every class X derived from TObject have a method called ClassType that returns a TClass that can be used to create instances of X.
Is there anything like that in C++?
Metaclasses do not exist in C++. Part of why is because metaclasses require virtual constructors and most-derived-to-base creation order, which are two things C++ does not have, but Delphi does.
However, in C++Builder specifically, there is limited support for Delphi metaclasses. The C++ compiler has a __classid() and __typeinfo() extension for retrieving a Delphi-compatible TMetaClass* pointer for any class derived from TObject. That pointer can be passed as-is to Delphi code (you can use Delphi .pas files in a C++Builder project).
The TApplication::CreateForm() method is implemented in Delphi and has a TMetaClass* parameter in C++ (despite its name, it can actually instantiate any class that derives from TComponent, if you do not mind the TApplication object being assigned as the Owner), for example:
TForm *f;
Application->CreateForm(__classid(TForm), &f);
f->ShowModal();
delete f;
Or you can write your own custom Delphi code if you need more control over the constructor call:
unit CreateAFormUnit;
interface
uses
Classes, Forms;
function CreateAForm(AClass: TFormClass; AOwner: TComponent): TForm;
implementation
function CreateAForm(AClass: TFormClass; AOwner: TComponent): TForm;
begin
Result := AClass.Create(AOwner);
end;
end.
#include "CreateAFormUnit.hpp"
TForm *f = CreateAForm(__classid(TForm), SomeOwner);
f->ShowModal();
delete f;
Apparently modern Delphi supports metaclasses in much the same way as original Smalltalk.
There is nothing like that in C++.
One main problem with emulating that feature in C++, having run-time dynamic assignment of values that represent type, and being able to create instances from such values, is that in C++ it's necessary to statically know the constructors of a type in order to instantiate.
Probably you can achieve much of the same high-level goal by using C++ static polymorphism, which includes function overloading and the template mechanism, instead of extreme runtime polymorphism with metaclasses.
However, one way to emulate the effect with C++, is to use cloneable exemplar-objects, and/or almost the same idea, polymorphic object factory objects. The former is quite unusual, the latter can be encountered now and then (mostly the difference is where the parameterization occurs: with the examplar-object it's that object's state, while with the object factory it's arguments to the creation function). Personally I would stay away from that, because C++ is designed for static typing, and this idea is about cajoling C++ into emulating a language with very different characteristics and programming style etc.
Type information does not exist at runtime with C++. (Except when enabling RTTI but it is still different than what you need)
A common idiom is to create a virtual clone() method that obviously clones the object which is usually in some prototypical state. It is similar to a constructor, but the concrete type is resolved at runtime.
class Object
{
public:
virtual Object* clone() const = 0;
};
If you don't mind spending some time examining foreign sources, you can take a look at how a project does it: https://github.com/rheit/zdoom/blob/master/src/dobjtype.h (note: this is a quite big and evolving source port of Doom, so be advised even just reading will take quite some time). Look at PClass and related types. I don't know what is done here exactly, but from my limited knowledge they construct a structure with necessary metatable for each class and use some preprocessor magic in form of defines for readability (or something else). Their approach allows seamlessly create usual C++ classes, but adds support for PClass::FindClass("SomeClass") to get the class reference and use that as needed, for example to create an instance of the class. It also can check inheritance, create new classes on the fly and replace classes by others, i. e. you can replace CDoesntWorksUnderWinXP by CWorksEverywhere (as an example, they use it differently of course). I had a quick research back then, their approach isn't exceptional, it was explained on some sites but since I had only so much interest I don't remember details.

C++ typedef versus unelaborated inheritance

I have a data structure made of nested STL containers:
typedef std::map<Solver::EnumValue, double> SmValueProb;
typedef std::map<Solver::VariableReference, Solver::EnumValue> SmGuard;
typedef std::map<SmGuard, SmValueProb> SmTransitions;
typedef std::map<Solver::EnumValue, SmTransitions> SmMachine;
This form of the data is only used briefly in my program, and there's not much behavior that makes sense to attach to these types besides simply storing their data. However, the compiler (VC++2010) complains that the resulting names are too long.
Redefining the types as subclasses of the STL containers with no further elaboration seems to work:
typedef std::map<Solver::EnumValue, double> SmValueProb;
class SmGuard : public std::map<Solver::VariableReference, Solver::EnumValue> { };
class SmTransitions : public std::map<SmGuard, SmValueProb> { };
class SmMachine : public std::map<Solver::EnumValue, SmTransitions> { };
Recognizing that the STL containers aren't intended to be used as a base class, is there actually any hazard in this scenario?
There is one hazard: if you call delete on a pointer to a base class with no virtual destructor, you have Undefined Behavior. Otherwise, you are fine.
At least that's the theory. In practice, in the MSVC ABI or the Itanium ABI (gcc, Clang, icc, ...) delete on a base class with no virtual destructor (-Wdelete-non-virtual-dtor with gcc and clang, providing the class has virtual methods) only results in a problem if your derived class adds non-static attributes with non-trivial destructor (eg. a std::string).
In your specific case, this seems fine... but...
... you might still want to encapsulate (using Composition) and expose meaningful (business-oriented) methods. Not only will it be less hazardous, it will also be easier to understand than it->second.find('x')->begin()...
Yes there is:
std::map<Solver::VariableReference, Solver::EnumValue>* x = new SmGuard;
delete x;
results in undefined behavior.
This is one of the controversial point of C++ vs "inheritance based classical OOP".
There are two aspect that must be taken in consideration:
a typedef is introduce another name for a same type: std::map<Solver::EnumValue, double> and SmValueProb are -at all effect- the exact same thing and cna be used interchangably.
a class introcuce a new type that is (by principle) unrelated with anything else.
Class relation are defined by the way the class is "made up", and what lets implicit operations and conversion to be possible with other types.
Outside of specific programming paradigms (like OOP, that associate to the concept of "inhritance" and "is-a" relation) inheritance, implicit constructors, implicit casts, and so on, all do a same thing: let a type to be used across the interface of another type, thus defining a network of possible operations across different types. This is (generally speaking) "polymorphism".
Various programming paradigms exist about saying how such a network should be structured each attempting to optimize a specific aspect of programming, like the representation or runtime-replacable objects (classical OOP), the representation of compile-time replacable objects (CRTP), the use of genreric algorithial function for different types (Generic programming), teh use of "pure function" to express algorithm composition (functional and lambda "captures").
All of them dictates some "rules" about how language "features" must be used, since -being C++ multiparadigm- non of its features satisfy alone the requirements of the paradigm, letting some dirtiness open.
As Luchian said, inheriting a std::map will not produce a pure OOP replaceable type, since a delete over a base-pointer will not know how to destroy the derived part, being the destructor not virtual by design.
But -in fact- this is just a particular case: also pbase->find will not call your own eventually overridden find method, being std::map::find not virtual. (But this is not undefined: it is very well defined to be most likely not what you intend).
The real question is another: is "classic OOP substitution principle" important in your design or not?
In other word, are you going to use your classes AND their bases each other interchangeably, with functions just taking a std::map* or std::map& parameter, pretending those function to call std::map functions resulting in calls to your methods?
If yes, inheritance is NOT THE WAY TO GO. There are no virtual methods in std::map, hence runtime polymorphism will not work.
If no, that is: you're just writing your own class reusing both std::map behavior and interface, with no intention of interchange their usage (in particular, you are not allocating your own classes with new and deletinf them with delete applyed to an std::map pointer), providing just a set of functions taking yourclass& or yourclass* as parameters, that that's perfectly fine. It may even be better than a typedef, since your function cannot be used with a std::map anymore, thus separating the functionalities.
The alternative can be "encapsulation": that is: make the map and explicit member of your class letting the map accessible as a public member, or making it a private member with an accessor function, or rewriting yourself the map interface in your class. You gat finally an unrelated type with tha same interface an its own behavior. At the cost to rewrite the entire interface of something that may have hundredths of methods.
NOTE:
To anyone thinking about the danger of the missing of vitual dtor, note tat encapluating with public visibility won't solve the problem:
class myclass: public std::map<something...>
{};
std::map<something...>* p = new myclass;
delete p;
is UB excatly like
class myclass
{
public:
std::map<something...> mp;
};
std::map<something...>* p = &((new myclass)->mp);
delete p;
The second sample has the same mistake as the first, it is just less common: they both pretend to use a pointer to a partial object to operate on the entire one, with nothing in the partial object letting you able to know what the "containing one" is.

Why should one not derive from c++ std string class?

I wanted to ask about a specific point made in Effective C++.
It says:
A destructor should be made virtual if a class needs to act like a polymorphic class. It further adds that since std::string does not have a virtual destructor, one should never derive from it. Also std::string is not even designed to be a base class, forget polymorphic base class.
I do not understand what specifically is required in a class to be eligible for being a base class (not a polymorphic one)?
Is the only reason that I should not derive from std::string class is it does not have a virtual destructor? For reusability purpose a base class can be defined and multiple derived class can inherit from it. So what makes std::string not even eligible as a base class?
Also, if there is a base class purely defined for reusability purpose and there are many derived types, is there any way to prevent client from doing Base* p = new Derived() because the classes are not meant to be used polymorphically?
I think this statement reflects the confusion here (emphasis mine):
I do not understand what specifically is required in a class to be eligible for being a base clas (not a polymorphic one)?
In idiomatic C++, there are two uses for deriving from a class:
private inheritance, used for mixins and aspect oriented programming using templates.
public inheritance, used for polymorphic situations only. EDIT: Okay, I guess this could be used in a few mixin scenarios too -- such as boost::iterator_facade -- which show up when the CRTP is in use.
There is absolutely no reason to publicly derive a class in C++ if you're not trying to do something polymorphic. The language comes with free functions as a standard feature of the language, and free functions are what you should be using here.
Think of it this way -- do you really want to force clients of your code to convert to using some proprietary string class simply because you want to tack on a few methods? Because unlike in Java or C# (or most similar object oriented languages), when you derive a class in C++ most users of the base class need to know about that kind of a change. In Java/C#, classes are usually accessed through references, which are similar to C++'s pointers. Therefore, there's a level of indirection involved which decouples the clients of your class, allowing you to substitute a derived class without other clients knowing.
However, in C++, classes are value types -- unlike in most other OO languages. The easiest way to see this is what's known as the slicing problem. Basically, consider:
int StringToNumber(std::string copyMeByValue)
{
std::istringstream converter(copyMeByValue);
int result;
if (converter >> result)
{
return result;
}
throw std::logic_error("That is not a number.");
}
If you pass your own string to this method, the copy constructor for std::string will be called to make a copy, not the copy constructor for your derived object -- no matter what child class of std::string is passed. This can lead to inconsistency between your methods and anything attached to the string. The function StringToNumber cannot simply take whatever your derived object is and copy that, simply because your derived object probably has a different size than a std::string -- but this function was compiled to reserve only the space for a std::string in automatic storage. In Java and C# this is not a problem because the only thing like automatic storage involved are reference types, and the references are always the same size. Not so in C++.
Long story short -- don't use inheritance to tack on methods in C++. That's not idiomatic and results in problems with the language. Use non-friend, non-member functions where possible, followed by composition. Don't use inheritance unless you're template metaprogramming or want polymorphic behavior. For more information, see Scott Meyers' Effective C++ Item 23: Prefer non-member non-friend functions to member functions.
EDIT: Here's a more complete example showing the slicing problem. You can see it's output on codepad.org
#include <ostream>
#include <iomanip>
struct Base
{
int aMemberForASize;
Base() { std::cout << "Constructing a base." << std::endl; }
Base(const Base&) { std::cout << "Copying a base." << std::endl; }
~Base() { std::cout << "Destroying a base." << std::endl; }
};
struct Derived : public Base
{
int aMemberThatMakesMeBiggerThanBase;
Derived() { std::cout << "Constructing a derived." << std::endl; }
Derived(const Derived&) : Base() { std::cout << "Copying a derived." << std::endl; }
~Derived() { std::cout << "Destroying a derived." << std::endl; }
};
int SomeThirdPartyMethod(Base /* SomeBase */)
{
return 42;
}
int main()
{
Derived derivedObject;
{
//Scope to show the copy behavior of copying a derived.
Derived aCopy(derivedObject);
}
SomeThirdPartyMethod(derivedObject);
}
To offer the counter side to the general advice (which is sound when there are no particular verbosity/productivity issues evident)...
Scenario for reasonable use
There is at least one scenario where public derivation from bases without virtual destructors can be a good decision:
you want some of the type-safety and code-readability benefits provided by dedicated user-defined types (classes)
an existing base is ideal for storing the data, and allows low-level operations that client code would also want to use
you want the convenience of reusing functions supporting that base class
you understand that any any additional invariants your data logically needs can only be enforced in code explicitly accessing the data as the derived type, and depending on the extent to which that will "naturally" happen in your design, and how much you can trust client code to understand and cooperate with the logically-ideal invariants, you may want members functions of the derived class to reverify expectations (and throw or whatever)
the derived class adds some highly type-specific convenience functions operating over the data, such as custom searches, data filtering / modifications, streaming, statistical analysis, (alternative) iterators
coupling of client code to the base is more appropriate than coupling to the derived class (as the base is either stable or changes to it reflect improvements to functionality also core to the derived class)
put another way: you want the derived class to continue to expose the same API as the base class, even if that means the client code is forced to change, rather than insulating it in some way that allows the base and derived APIs to grow out of sync
you're not going to be mixing pointers to base and derived objects in parts of the code responsible for deleting them
This may sound quite restrictive, but there are plenty of cases in real world programs matching this scenario.
Background discussion: relative merits
Programming is about compromises. Before you write a more conceptually "correct" program:
consider whether it requires added complexity and code that obfuscates the real program logic, and is therefore more error prone overall despite handling one specific issue more robustly,
weigh the practical costs against the probability and consequences of issues, and
consider "return on investment" and what else you could be doing with your time.
If the potential problems involve usage of the objects that you just can't imagine anyone attempting given your insights into their accessibility, scope and nature of usage in the program, or you can generate compile-time errors for dangerous use (e.g. an assertion that derived class size matches the base's, which would prevent adding new data members), then anything else may be premature over-engineering. Take the easy win in clean, intuitive, concise design and code.
Reasons to consider derivation sans virtual destructor
Say you have a class D publicly derived from B. With no effort, the operations on B are possible on D (with the exception of construction, but even if there are a lot of constructors you can often provide effective forwarding by having one template for each distinct number of constructor arguments: e.g. template <typename T1, typename T2> D(const T1& x1, const T2& t2) : B(t1, t2) { }. Better generalised solution in C++0x variadic templates.)
Further, if B changes then by default D exposes those changes - staying in sync - but someone may need to review extended functionality introduced in D to see if it remains valid, and the client usage.
Rephrasing this: there is reduced explicit coupling between base and derived class, but increased coupling between base and client.
This is often NOT what you want, but sometimes it is ideal, and other times a non issue (see next paragraph). Changes to the base force more client code changes in places distributed throughout the code base, and sometimes the people changing the base may not even have access to the client code to review or update it correspondingly. Sometimes it is better though: if you as the derived class provider - the "man in the middle" - want base class changes to feed through to clients, and you generally want clients to be able - sometimes forced - to update their code when the base class changes without you needing to be constantly involved, then public derivation may be ideal. This is common when your class is not so much an independent entity in its own right, but a thin value-add to the base.
Other times the base class interface is so stable that the coupling may be deemed a non issue. This is especially true of classes like Standard containers.
Summarily, public derivation is a quick way to get or approximate the ideal, familiar base class interface for the derived class - in a way that's concise and self-evidently correct to both the maintainer and client coder - with additional functionality available as member functions (which IMHO - which obviously differs with Sutter, Alexandrescu etc - can aid usability, readability and assist productivity-enhancing tools including IDEs)
C++ Coding Standards - Sutter & Alexandrescu - cons examined
Item 35 of C++ Coding Standards lists issues with the scenario of deriving from std::string. As scenarios go, it's good that it illustrates the burden of exposing a large but useful API, but both good and bad as the base API is remarkably stable - being part of the Standard Library. A stable base is a common situation, but no more common than a volatile one and a good analysis should relate to both cases. While considering the book's list of issues, I'll specifically contrast the issues' applicability to the cases of say:
a) class Issue_Id : public std::string { ...handy stuff... }; <-- public derivation, our controversial usage
b) class Issue_Id : public string_with_virtual_destructor { ...handy stuff... }; <- safer OO derivation
c) class Issue_Id { public: ...handy stuff... private: std::string id_; }; <-- a compositional approach
d) using std::string everywhere, with freestanding support functions
(Hopefully we can agree the composition is acceptable practice, as it provides encapsulation, type safety as well as a potentially enriched API over and above that of std::string.)
So, say you're writing some new code and start thinking about the conceptual entities in an OO sense. Maybe in a bug tracking system (I'm thinking of JIRA), one of them is say an Issue_Id. Data content is textual - consisting of an alphabetic project id, a hyphen, and an incrementing issue number: e.g. "MYAPP-1234". Issue ids can be stored in a std::string, and there will be lots of fiddly little text searches and manipulation operations needed on issue ids - a large subset of those already provided on std::string and a few more for good measure (e.g. getting the project id component, providing the next possible issue id (MYAPP-1235)).
On to Sutter and Alexandrescu's list of issues...
Nonmember functions work well within existing code that already manipulates strings. If instead you supply a super_string, you force changes through your code base to change types and function signatures to super_string.
The fundamental mistake with this claim (and most of the ones below) is that it promotes the convenience of using only a few types, ignoring the benefits of type safety. It's expressing a preference for d) above, rather than insight into c) or b) as alternatives to a). The art of programming involves balancing the pros and cons of distinct types to achieve reasonable reuse, performance, convenience and safety. The paragraphs below elaborate on this.
Using public derivation, the existing code can implicitly access the base class string as a string, and continue to behave as it always has. There's no specific reason to think that the existing code would want to use any additional functionality from super_string (in our case Issue_Id)... in fact it's often lower-level support code pre-existing the application for which you're creating the super_string, and therefore oblivious to the needs provided for by the extended functions. For example, say there's a non-member function to_upper(std::string&, std::string::size_type from, std::string::size_type to) - it could still be applied to an Issue_Id.
So, unless the non-member support function is being cleaned up or extended at the deliberate cost of tightly coupling it to the new code, then it needn't be touched. If it is being overhauled to support issue ids (for example, using the insight into the data content format to upper-case only leading alpha characters), then it's probably a good thing to ensure it really is being passed an Issue_Id by creating an overload ala to_upper(Issue_Id&) and sticking to either the derivation or compositional approaches allowing type safety. Whether super_string or composition is used makes no difference to effort or maintainability. A to_upper_leading_alpha_only(std::string&) reusable free-standing support function isn't likely to be of much use - I can't recall the last time I wanted such a function.
The impulse to use std::string everywhere isn't qualitatively different to accepting all your arguments as containers of variants or void*s so you don't have to change your interfaces to accept arbitrary data, but it makes for error prone implementation and less self-documenting and compiler-verifiable code.
Interface functions that take a string now need to: a) stay away from super_string's added functionality (unuseful); b) copy their argument to a super_string (wasteful); or c) cast the string reference to a super_string reference (awkward and potentially illegal).
This seems to be revisiting the first point - old code that needs to be refactored to use the new functionality, albeit this time client code rather than support code. If the function wants to start treating its argument as an entity for which the new operations are relevant, then it should start taking its arguments as that type and the clients should generate them and accept them using that type. The exact same issues exists for composition. Otherwise, c) can be practical and safe if the guidelines I list below are followed, though it is ugly.
super_string's member functions don't have any more access to string's internals than nonmember functions because string probably doesn't have protected members (remember, it wasn't meant to be derived from in the first place)
True, but sometimes that's a good thing. A lot of base classes have no protected data. The public string interface is all that's needed to manipulate the contents, and useful functionality (e.g. get_project_id() postulated above) can be elegantly expressed in terms of those operations. Conceptually, many times I've derived from Standard containers, I've wanted not to extend or customise their functionality along the existing lines - they're already "perfect" containers - rather I've wanted to add another dimension of behaviour that's specific to my application, and requires no private access. It's because they're already good containers that they're good to reuse.
If super_string hides some of string's functions (and redefining a nonvirtual function in a derived class is not overriding, it's just hiding), that could cause widespread confusion in code that manipulates strings that started their life converted automatically from super_strings.
True for composition too - and more likely to happen as the code doesn't default to passing things through and hence staying in sync, and also true in some situations with run-time polymorphic hierarchies as well. Samed named functions that behave differently in classes that initial appear interchangeable - just nasty. This is effectively the usual caution for correct OO programming, and again not a sufficient reason to abandon the benefits in type safety etc..
What if super_string wants to inherit from string to add more state [explanation of slicing]
Agreed - not a good situation, and somewhere I personally tend to draw the line as it often moves the problems of deletion through a pointer to base from the realm of theory to the very practical - destructors aren't invoked for additional members. Still, slicing can often do what's wanted - given the approach of deriving super_string not to change its inherited functionality, but to add another "dimension" of application-specific functionality....
Admittedly, it's tedious to have to write passthrough functions for the member functions you want to keep, but such an implementation is vastly better and safer than using public or nonpublic inheritance.
Well, certainly agree about the tedium....
Guidelines for successful derivation sans virtual destructor
ideally, avoid adding data members in derived class: variants of slicing can accidentally remove data members, corrupt them, fail to initialise them...
even more so - avoid non-POD data members: deletion via base-class pointer is technically undefined behaviour anyway, but with non-POD types failing to run their destructors is more likely to have non-theoretical problems with resource leaks, bad reference counts etc.
honour the Liskov Substitution Principal / you can't robustly maintain new invariants
for example, in deriving from std::string you can't intercept a few functions and expect your objects to remain uppercase: any code that accesses them via a std::string& or ...* can use std::string's original function implementations to change the value)
derive to model a higher level entity in your application, to extend the inherited functionality with some functionality that uses but doesn't conflict with the base; do not expect or try to change the basic operations - and access to those operations - granted by the base type
be aware of the coupling: base class can't be removed without affecting client code even if the base class evolves to have inappropriate functionality, i.e. your derived class's usability depends on the ongoing appropriateness of the base
sometimes even if you use composition you'll need to expose the data member due to performance, thread safety issues or lack of value semantics - so the loss of encapsulation from public derivation isn't tangibly worse
the more likely people using the potentially-derived class will be unaware of its implementation compromises, the less you can afford to make them dangerous
therefore, low-level widely deployed libraries with many ad-hoc casual users should be more wary of dangerous derivation than localised use by programmers routinely using the functionality at application level and/or in "private" implementation / libraries
Summary
Such derivation is not without issues so don't consider it unless the end result justifies the means. That said, I flatly reject any claim that this can't be used safely and appropriately in particular cases - it's just a matter of where to draw the line.
Personal experience
I do sometimes derive from std::map<>, std::vector<>, std::string etc - I've never been burnt by the slicing or delete-via-base-class-pointer issues, and I've saved a lot of time and energy for more important things. I don't store such objects in heterogeneous polymorphic containers. But, you need to consider whether all the programmers using the object are aware of the issues and likely to program accordingly. I personally like to write my code to use heap and run-time polymorphism only when needed, while some people (due to Java backgrounds, their prefered approach to managing recompilation dependencies or switching between runtime behaviours, testing facilities etc.) use them habitually and therefore need to be more concerned about safe operations via base class pointers.
If you really want to derive from it (not discussing why you want to do it) I think you can prevent Derived class direct heap instantiation by making it's operator new private:
class StringDerived : public std::string {
//...
private:
static void* operator new(size_t size);
static void operator delete(void *ptr);
};
But this way you restrict yourself from any dynamic StringDerived objects.
Not only is the destructor not virtual, std::string contains no virtual functions at all, and no protected members. That makes it very hard for the derived class to modify its functionality.
Then why would you derive from it?
Another problem with being non-polymorphic is that if you pass your derived class to a function expecting a string parameter, your extra functionality will just be sliced off and the object will be seen as a plain string again.
Why should one not derive from c++ std string class?
Because it is not necessary. If you want to use DerivedString for functionality extension; I don't see any problem in deriving std::string. The only thing is, you should not interact between both classes (i.e. don't use string as a receiver for DerivedString).
Is there any way to prevent client from doing Base* p = new Derived()
Yes. Make sure that you provide inline wrappers around Base methods inside Derived class. e.g.
class Derived : protected Base { // 'protected' to avoid Base* p = new Derived
const char* c_str () const { return Base::c_str(); }
//...
};
There are two simple reasons for not deriving from a non-polymorphic class:
Technical: it introduces slicing bugs (because in C++ we pass by value unless otherwise specified)
Functional: if it is non-polymorphic, you can achieve the same effect with composition and some function forwarding
If you wish to add new functionalities to std::string, then first consider using free functions (possibly templates), like the Boost String Algorithm library does.
If you wish to add new data members, then properly wrap the class access by embedding it (Composition) inside a class of your own design.
EDIT:
#Tony noticed rightly that the Functional reason I cited was probably meaningless to most people. There is a simple rule of thumb, in good design, that says that when you can pick a solution among several, you should consider the one with the weaker coupling. Composition has weaker coupling that Inheritance, and thus should be preferred, when possible.
Also, composition gives you the opportunity to nicely wrap the original's class method. This is not possible if you pick inheritance (public) and the methods are not virtual (which is the case here).
The C++ standard states that If Base class destructor is not virtual and you delete an object of Base class that points to the object of an derived class then it causes an undefined Behavior.
C++ standard section 5.3.5/3:
if the static type of the operand is different from its dynamic type, the static type shall be a base class of the operand’s dynamic type and the static type shall have a virtual destructor or the behavior is undefined.
To be clear on the Non-polymorphic class & need of virtual destructor
The purpose of making a destructor virtual is to facilitate the polymorphic deletion of objects through delete-expression. If there is no polymorphic deletion of objects, then you don't need virtual destructor's.
Why not to derive from String Class?
One should generally avoid deriving from any standard container class because of the very reason that they don' have virtual destructors, which make it impossible to delete objects polymorphically.
As for the string class, the string class doesn't have any virtual functions so there is nothing that you can possibly override. The best you can do is hide something.
If at all you want to have a string like functionality you should write a class of your own rather than inherit from std::string.
As soon as you add any member (variable) into your derived std::string class, will you systematically screw the stack if you attempt to use the std goodies with an instance of your derived std::string class? Because the stdc++ functions/members have their stack pointers[indexes] fixed [and adjusted] to the size/boundary of the (base std::string) instance size.
Right?
Please, correct me if I am wrong.

I need some C++ guru's opinions on extending std::string

I've always wanted a bit more functionality in STL's string. Since subclassing STL types is a no no, mostly I've seen the recommended method of extension of these classes is just to write functions (not member functions) that take the type as the first argument.
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
A while ago I came up with the following:
class StringBox
{
public:
StringBox( std::string& storage ) :
_storage( storage )
{
}
// Methods I wish std::string had...
void Format();
void Split();
double ToDouble();
void Join(); // etc...
private:
StringBox();
std::string& _storage;
};
Note that StringBox requires a reference to a std::string for construction... This puts some interesting limits on it's use (and I hope, means it doesn't contribute to the string class proliferation problem)... In my own code, I'm almost always just declaring it on the stack in a method, just to modify a std::string.
A use example might look like this:
string OperateOnString( float num, string a, string b )
{
string nameS;
StringBox name( nameS );
name.Format( "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
My question is: What do the C++ guru's of the StackOverflow community think of this method of STL extension?
I've never been thrilled with this solution. For one, it's not necessarily obvious where all such methods are in the code, for another, I just don't like the syntax. I want to use . when I call methods!
And I want to use $!---& when I call methods! Deal with it. If you're going to write C++ code, stick to C++ conventions. And a very important C++ convention is to prefer non-member functions when possible.
There is a reason C++ gurus recommend this:
It improves encapsulation, extensibility and reuse. (std::sort can work with all iterator pairs because it isn't a member of any single iterator or container class. And no matter how you extend std::string, you can not break it, as long as you stick to non-member functions. And even if you don't have access to, or aren't allowed to modify, the source code for a class, you can still extend it by defining nonmember functions)
Personally, I can't see the point in your code. Isn't this a lot simpler, more readable and shorter?
string OperateOnString( float num, string a, string b )
{
string nameS;
Format(nameS, "%f-%s-%s", num, a.c_str(), b.c_str() );
return nameS;
}
// or even better, if `Format` is made to return the string it creates, instead of taking it as a parameter
string OperateOnString( float num, string a, string b )
{
return Format("%f-%s-%s", num, a.c_str(), b.c_str() );
}
When in Rome, do as the Romans, as the saying goes. Especially when the Romans have good reasons to do as they do. And especially when your own way of doing it doesn't actually have a single advantage. It is more error-prone, confusing to people reading your code, non-idiomatic and it is just more lines of code to do the same thing.
As for your problem that it's hard to find the non-member functions that extend string, place them in a namespace if that's a concern. That's what they're for. Create a namespace StringUtil or something, and put them there.
As most of us "gurus" seem to favour the use of free functions, probably contained in a namespace, I think it safe to say that your solution will not be popular. I'm afraid I can't see one single advantage it has, and the fact that the class contains a reference is an invitation to that becoming a dangling reference.
I'll add a little something that hasn't already been posted. The Boost String Algorithms library has taken the free template function approach, and the string algorithms they provide are spectacularly re-usable for anything that looks like a string: std::string, char*, std::vector, iterator pairs... you name it! And they put them all neatly in the boost::algorithm namespace (I often use using namespace algo = boost::algorithm to make string manipulation code more terse).
So consider using free template functions for your string extensions, and look at Boost String Algorithms on how to make them "universal".
For safe printf-style formatting, check out Boost.Format. It can output to strings and streams.
I too wanted everything to be a member function, but I'm now starting to see the light. UML and doxygen are always pressuring me to put functions inside of classes, because I was brainwashed by the idea that C++ API == class hierarchy.
If the scope of the string isn't the same as the StringBox you can get segfaults:
StringBox foo() {
string s("abc");
return StringBox(s);
}
At least prevent object copying by declaring the assignment operator and copy ctor private:
class StringBox {
//...
private:
void operator=(const StringBox&);
StringBox(const StringBox&);
};
EDIT: regarding API, in order to prevent surprises I would make the StringBox own its copy of the string. I can think fo 2 ways to do this:
Copy the string to a member (not a reference), get the result later - also as a copy
Access your string through a reference-counting smart pointer like std::tr1::shared_ptr or boost:shared_ptr, to prevent extra copying
The problem with loose functions is that they're loose functions.
I would bet money that most of you have created a function that was already provided by the STL because you simply didn't know the STL function existed, or that it could do what you were trying to accomplish.
It's a fairly punishing design, especially for new users. (The STL gets new additions too, further adding to the problem.)
Google: C++ to string
How many results mention: std::to_string
I'm just as likely to find some ancient C method, or some homemade version, as I am to find the STL version of any given function.
I much prefer member methods because you don't have to struggle to find them, and you don't need to worry about finding old deprecated versions, etc,. (ie, string.SomeMethod, is pretty much guaranteed to be the method you should be using, and it gives you something concrete to Google for.)
C# style extension methods would be a good solution.
They're loose functions.
They show up as member functions via intellisense.
This should allow everyone to do exactly what they want.
It seems like it could be accomplished in the IDE itself, rather than requiring any language changes.
Basically, if the interpreter hits some call to a member that doesn't exist, it can check headers for matching loose functions, and dynamically fix it up before passing it on to the compiler.
Something similar could be done when it's loading up the intellisense data.
I have no idea how this could be worked for existing functions, no massive change like this should be taken lightly, but, for new functions using a new syntax, it shouldn't be a problem.
namespace StringExt
{
std::string MyFunc(this std::string source);
}
That can be used by itself, or as a member of std::string, and the IDE can handle all the grunt work.
Of course, this still leaves the problem of methods being spread out over various headers, which could be solved in various ways.
Some sort of extension header: string_ext which could include common methods.
Hmm....
That's a tougher issue to solve without causing issues...
If you want to extend the methods available to act on string, I would extend it by creating a class that has static methods that take the standard string as a parameter.
That way, people are free to use your utilities, but don't need to change the signatures of their functions to take a new class.
This breaks the object-oriented model a little, but makes the code much more robust - i.e. if you change your string class, then it doesn't have as much impact on other code.
Follow the recommended guidelines, they are there for a reason :)
The best way is to use templated free functions. The next best is private inheritance struct extended_str : private string, which happens to get easier in C++0x by the way as you can using constructors. Private inheritance is too much trouble and too risky just to add some algorithms. What you are doing is too risky for anything.
You've just introduced a nontrivial data structure to accomplish a change in code punctuation. You have to manually create and destroy a Box for each string, and you still need to distinguish your methods from the native ones. You will quickly get tired of this convention.

C++: Copy constructor: Use getters or access member vars directly?

I have a simple container class with a copy constructor.
Do you recommend using getters and setters, or accessing the member variables directly?
public Container
{
public:
Container() {}
Container(const Container& cont) //option 1
{
SetMyString(cont.GetMyString());
}
//OR
Container(const Container& cont) //option 2
{
m_str1 = cont.m_str1;
}
public string GetMyString() { return m_str1;}
public void SetMyString(string str) { m_str1 = str;}
private:
string m_str1;
}
In the example, all code is inline, but in our real code there is no inline code.
Update (29 Sept 09):
Some of these answers are well written however they seem to get missing the point of this question:
this is simple contrived example to discuss using getters/setters vs variables
initializer lists or private validator functions are not really part of this question. I'm wondering if either design will make the code easier to maintain and expand.
Some ppl are focusing on the string in this example however it is just an example, imagine it is a different object instead.
I'm not concerned about performance. we're not programming on the PDP-11
EDIT: Answering the edited question :)
this is simple contrived example to
discuss using getters/setters vs
variables
If you have a simple collection of variables, that don't need any kind of validation, nor additional processing then you might consider using a POD instead. From Stroustrup's FAQ:
A well-designed class presents a clean
and simple interface to its users,
hiding its representation and saving
its users from having to know about
that representation. If the
representation shouldn't be hidden -
say, because users should be able to
change any data member any way they
like - you can think of that class as
"just a plain old data structure"
In short, this is not JAVA. you shouldn't write plain getters/setters because they are as bad as exposing the variables them selves.
initializer lists or private validator functions are not really
part of this question. I'm wondering
if either design will make the code
easier to maintain and expand.
If you are copying another object's variables, then the source object should be in a valid state. How did the ill formed source object got constructed in the first place?! Shouldn't constructors do the job of validation? aren't the modifying member functions responsible of maintaining the class invariant by validating input? Why would you validate a "valid" object in a copy constructor?
I'm not concerned about performance. we're not programming on the PDP-11
This is about the most elegant style, though in C++ the most elegant code has the best performance characteristics usually.
You should use an initializer list. In your code, m_str1 is default constructed then assigned a new value. Your code could be something like this:
class Container
{
public:
Container() {}
Container(const Container& cont) : m_str1(cont.m_str1)
{ }
string GetMyString() { return m_str1;}
void SetMyString(string str) { m_str1 = str;}
private:
string m_str1;
};
#cbrulak You shouldn't IMO validate cont.m_str1 in the copy constructor. What I do, is to validate things in constructors. Validation in copy constructor means you you are copying an ill formed object in the first place, for example:
Container(const string& str) : m_str1(str)
{
if(!valid(m_str1)) // valid() is a function to check your input
{
// throw an exception!
}
}
You should use an initializer list, and then the question becomes meaningless, as in:
Container(const Container& rhs)
: m_str1(rhs.m_str1)
{}
There's a great section in Matthew Wilson's Imperfect C++ that explains all about Member Initializer Lists, and about how you can use them in combination with const and/or references to make your code safer.
Edit: an example showing validation and const:
class Container
{
public:
Container(const string& str)
: m_str1(validate_string(str))
{}
private:
static const string& validate_string(const string& str)
{
if(str.empty())
{
throw runtime_error("invalid argument");
}
return str;
}
private:
const string m_str1;
};
As it's written right now (with no qualification of the input or output) your getter and setter (accessor and mutator, if you prefer) are accomplishing absolutely nothing, so you might as well just make the string public and be done with it.
If the real code really does qualify the string, then chances are pretty good that what you're dealing with isn't properly a string at all -- instead, it's just something that looks a lot like a string. What you're really doing in this case is abusing the type system, sort of exposing a string, when the real type is only something a bit like a string. You're then providing the setter to try to enforce whatever restrictions the real type has compared to a real string.
When you look at it from that direction, the answer becomes fairly obvious: rather than a string, with a setter to make the string act like some other (more restricted) type, what you should be doing instead is defining an actual class for the type you really want. Having defined that class correctly, you make an instance of it public. If (as seems to be the case here) it's reasonable to assign it a value that starts out as a string, then that class should contain an assignment operator that takes a string as an argument. If (as also seems to be the case here) it's reasonable to convert that type to a string under some circumstances, it can also include cast operator that produces a string as the result.
This gives a real improvement over using a setter and getter in a surrounding class. First and foremost, when you put those in a surrounding class, it's easy for code inside that class to bypass the getter/setter, losing enforcement of whatever the setter was supposed to enforce. Second, it maintains a normal-looking notation. Using a getter and a setter forces you to write code that's just plain ugly and hard to read.
One of the major strengths of a string class in C++ is using operator overloading so you can replace something like:
strcpy(strcat(filename, ".ext"));
with:
filename += ".ext";
to improve readability. But look what happens if that string is part of a class that forces us to go through a getter and setter:
some_object.setfilename(some_object.getfilename()+".ext");
If anything, the C code is actually more readable than this mess. On the other hand, consider what happens if we do the job right, with a public object of a class that defines an operator string and operator=:
some_object.filename += ".ext";
Nice, simple and readable, just like it should be. Better still, if we need to enforce something about the string, we can inspect only that small class, we really only have to look one or two specific, well-known places (operator=, possibly a ctor or two for that class) to know that it's always enforced -- a totally different story from when we're using a setter to try to do the job.
Do you anticipate how the string is returned, eg. white space trimmed, null checked, etc.? Same with SetMyString(), if the answer is yes, you are better off with access methods since you don't have to change your code in zillion places but just modify those getter and setter methods.
Ask yourself what the costs and benefits are.
Cost: higher runtime overhead. Calling virtual functions in ctors is a bad idea, but setters and getters are unlikely to be virtual.
Benefits: if the setter/getter does something complicated, you're not repeating code; if it does something unintuitive, you're not forgetting to do that.
The cost/benefit ratio will differ for different classes. Once you're ascertained that ratio, use your judgment. For immutable classes, of course, you don't have setters, and you don't need getters (as const members and references can be public as no one can change/reseat them).
There's no silver bullet as how to write the copy constructor.
If your class only has members which provide a copy constructor that creates
instances which do not share state (or at least do not appear to do so) using an initializer list is a good way.
Otherwise you'll have to actually think.
struct alpha {
beta* m_beta;
alpha() : m_beta(new beta()) {}
~alpha() { delete m_beta; }
alpha(const alpha& a) {
// need to copy? or do you have a shared state? copy on write?
m_beta = new beta(*a.m_beta);
// wrong
m_beta = a.m_beta;
}
Note that you can get around the potential segfault by using smart_ptr - but you can have a lot of fun debugging the resulting bugs.
Of course it can get even funnier.
Members which are created on demand.
new beta(a.beta) is wrong in case you somehow introduce polymorphism.
... a screw the otherwise - please always think when writing a copy constructor.
Why do you need getters and setters at all?
Simple :) - They preserve invariants - i.e. guarantees your class makes, such as "MyString always has an even number of characters".
If implemented as intended, your object is always in a valid state - so a memberwise copy can very well copy the members directly without fear of breaking any guarantee. There is no advantage of passing already validated state through another round of state validation.
As AraK said, the best would be using an initializer list.
Not so simple (1):
Another reason to use getters/setters is not relying on implementation details. That's a strange idea for a copy CTor, when changing such implementation details you almost always need to adjust CDA anyway.
Not so simple (2):
To prove me wrong, you can construct invariants that are dependent on the instance itself, or another external factor. One (very contrieved) example: "if the number of instances is even, the string length is even, otherwise it's odd." In that case, the copy CTor would have to throw, or adjust the string. In such a case it might help to use setters/getters - but that's not the general cas. You shouldn't derive general rules from oddities.
I prefer using an interface for outer classes to access the data, in case you want to change the way it's retrieved. However, when you're within the scope of the class and want to replicate the internal state of the copied value, I'd go with data members directly.
Not to mention that you'll probably save a few function calls if the getter are not inlined.
If your getters are (inline and) not virtual, there's no pluses nor minuses in using them wrt direct member access -- it just looks goofy to me in terms of style, but, no big deal either way.
If your getters are virtual, then there is overhead... but nevertheless that's exactly when you DO want to call them, just in case they're overridden in a subclass!-)
There is a simple test that works for many design questions, this one included: add side-effects and see what breaks.
Suppose setter not only assigns a value, but also writes audit record, logs a message or raises an event. Do you want this happen for every property when copying object? Probably not - so calling setters in constructor is logically wrong (even if setters are in fact just assignments).
Although I agree with other posters that there are many entry-level C++ "no-no's" in your sample, putting that to the side and answering your question directly:
In practice, I tend to make many but not all of my member fields* public to start with, and then move them to get/set when needed.
Now, I will be the first to say that this is not necessarily a recommended practice, and many practitioners will abhor this and say that every field should have setters/getters.
Maybe. But I find that in practice this isn't always necessary. Granted, it causes pain later when I change a field from public to a getter, and sometimes when I know what usage a class will have, I give it set/get and make the field protected or private from the start.
YMMV
RF
you call fields "variables" - I encourage you to use that term only for local variables within a function/method