Related
I have a problem which I cannot understand:
Let's Say I have a class System with several member fields, and one of them is of type unordered_map, so when I declare the class in the header file, I write at the beginning of the header #include <unordered_map>.
Now, I have two ways of declaring this field:
1.std::unordered_map<std::string,int> umap;
2.std::unordered_map<std::string,int>* p_umap;
Now in the constructor of the class, if I choose the first option, there is no need to initialize that field in the initializer list since the constructor of class System will call the default constructor for the field umap as part of constructing an instance of type class System.
If I choose the second option, I should initialize the field p_umap in the constructor (in the initialize list) with the operator new and in the destructor, to delete this dynamic allocation.
What is the difference between these two options? If you have a class that one of it's fields is of type unordered_map, how do you declare this field? As a pointer or as a variable of type unordered_map?
In a situation like the one you are describing, it seems like the first option is preferable. Most likely, in fact, the unordered map is intended to be owned by the class it is a data member of. In other words, its lifetime should not be extended beyond the lifetime of the encapsulating class, and the encapsulating class has the responsibility of creating and destroying the unordered map.
While with option 1 all this work is done automatically, with option 2 you would have to take care of it manually (and take care of correct copy-construction, copy-assignment, exception-safety, lack of memory leaks, and so on). Surely you could use smart pointers (e.g. std::unique_ptr<>) to encapsulate this responsibility into a wrapper that would take care of deleting the wrapped object when the smart pointer itself goes out of scope (this idiom is called RAII, which is an acronym for Resource Acquisition Is Initialization).
However, it seems to me like you do not really need a pointer at all here. You have an object whose lifetime is completely bounded by the lifetime of the class that contains it. In these situations, you should just not use pointers and prefer declaring the variable as:
std::unordered_map<std::string, int> umap;
Make it not a pointer until you need to make it a pointer.
Pointers are rife with user error.
For example, you forgot to mention that your class System would also need to implement
System( const Sysytem& )
and
System& operator= ( const System& )
or Bad Behavior will arise when you try to copy your object.
The difference is in how you want to be able to access umap. Pointers can allow for a bit more flexibility, but they obviously add complexity in terms of allocation (stack vs heap, destructors and such). If you use a pointer to umap, you can do some pretty convoluted stuff such as making two System's with the same umap. In the end though, go with KISS unless there's a compelling reason not to.
There is no need to define it as pointer. If you do it, you must also make sure to implement copy constructor and assignment operator, or disable them completely.
If there is no specific reason to make it a pointer (and you don't show any) just make it a normal member variable.
According to Wikipedia:
An object is first-class when it:
can be stored in variables and data structures
can be passed as a parameter to a subroutine
can be returned as the result of a subroutine
can be constructed at runtime
has intrinsic identity (independent of any given name)
Somebody had once told me that raw pointers are not first class objects while smart pointers like std::auto_ptr are. But to me, a raw pointer (to an object or to a function) in C++ does seem to me to satisfy the conditions stated above to qualify as a first class object. Am I missing something?
The definition in Wikipedia of "first-class" is incorrect. In a programming language, an entity is considered to be "first-class" if it does not require any special treatment by the compiler and can be interpreted, understood, used, etc. just like any other object in the language. The reason pointers in C++ are not first class objects is that *p for a pointer p is not interpreted as simply invoking an overloaded operator* like it would be for any other object; instead, the compiler has to treat p specially based on the fact that it is a pointer type. When passing a pointer or a reference you cannot simply pass any value object (and some of those value objects happen to be pointers), but you are actually saying that it is a different type (a pointer type or a reference type, in place of a value type), and its interpretation / usage is subject to its type.
A good example of a language where all objects are first-class is Python. In Python, everything has some sort of type associated with it, it can be treated as an object, it can have members, functions, etc. As an extreme example, even functions in Python are objects that simply contain code and happen to be callable; in fact, if you try to call the function using the __call__ member function (which is akin to overloading operator() in C++, and which other non-function objects can provide), an ordinary function will return a function wrapper as a result (so that you don't have to treat functions as a special case). You can even add members to functions like you would to other objects. Here is an example of such a thing:
>>> def f(x):
... return x;
...
>>> func = f;
>>> print f.__class__
>>> print f(5)
5
>>> print f.__call__(5)
5
>>> f.myOwnMember = "Hello world!"
>>> print f.myOwnMember
Hello world!
I apologize for the Python in a C++ post, but it is hard to explain the concept without the contrast.
The Wikipedia article is shit (for once I agree with "citation needed"), but you are not missing anything: according to the Wikipedia definition, and also according to any sane definition, raw C++ pointers are first-class objects.
If anyone would like to volunteer to help me with some searches through Google Scholar, I would really like to see that Wikipedia article fixed. I am a subject-matter expert, but I am also extremely busy—I would love to have a partner to work on this with.
Actually both - "Pointers are FCO" and "Pointers are not FCO" - are correct. We need to take the statements in the respective context. (FCO - First Class Object)
When we talk of a 'pointer' in C++, we are talking of a data type that stores the address of some other data. Now whether this is FCO or not, depends really on how we envisage to use it. Unfortunately, this use semantics is not built-in for Pointers in C++.
If we are using pointers merely to 'point' to data, it will satisfy the requirements of being an FCO. However, if we use a pointer to 'hold' a data, then it can no longer be considered an FCO as its copy and assignment semantics do not work. Such 'resource handling' pointers (or more directly referred as 'raw pointers') are our interest in the context of Smart Pointer study. These are not FCO while the corresponding Smart Pointers are. In contrast, mere tracking pointers would continue to meet the requirements for an FCO.
The following paragraph from "Modern C++ Design" book nicely elucidates the point.
I quote from the Chapter on Smart Pointers:
An object with value semantics is an
object that you can copy and assign
to. Type int is the perfect example of
a first-class object. You can create,
copy, and change integer values
freely. A pointer that you use to
iterate in a buffer also has value
semantics—you initialize it to point
to the beginning of the buffer, and
you bump it until you reach the end.
Along the way, you can copy its value
to other variables to hold temporary
results.
With pointers that hold values
allocated with new, however, the story
is very different. Once you have
written
Widget* p = new Widget;
the variable p not only points to, but also owns, the memory
allocated for the Widget object. This
is because later you must issue delete
p to ensure that the Widget object is
destroyed and its memory is released.
If in the line after the line just
shown you write
p = 0; // assign something else to p
you lose ownership of the object previously pointed to by p, and you
have no chance at all to get a grip on
it again. You have a resource leak,
and resource leaks never help.
I hope this clarifies.
As Mike commented there seems to be a bit of confusion with the word Object here. Your friend seems to be confusing between objects as in "first class objects" and Objects as "instances of C++ classes", two very different things where the word object is used for two things completely different.
In Object Oriented programming, the word Object is used to speak of some data structure gathering datafields and methods to manipulate them. With C++ you define them in classes, get inheritance, etc.
On the other hand in the expression "first class object" the word object has a much broader meaning, not limited to Objects of object oriented languages. An integer qualify as a first class object, a pointer also do. I believe "Objects" with the other meaning does not really qualify as first class object in C++ (they do in other OO programming languages) because of restrictions on them on parameter passing (but you can pass around pointers or references to them, the difference is not so big).
Then the things are really the opposite of what you friend said: pointers are first class objects but are not instances of C++ classes, smart pointers are instances of C++ classes but are not first class objects.
From wiki:
In computing, a first-class object
(also value, entity, and citizen), in
the context of a particular
programming language, is an entity
which can be passed as a parameter,
returned from a subroutine, or
assigned into a variable.1 In
computer science the term reification
is used when referring to the process
(technique, mechanism) of making
something a first-class
object.[citation needed]
I included the entire entry in wiki-pedia for context. According to this definition - an entity which can be passed as a parameter - the C++ pointer would be a first class object.
Additional links supporting the argument:
catalysoft
wapedia
reification: referring to the process (technique, mechanism) of making something a first-class object
If I understand correctly we have at least two different ways of implementing composition. (The case of implementation with smart pointers is excluded for simplicity. I almost don't use STL and have no desire to learn it.)
Let's have a look at Wikipedia example:
class Car
{
private:
Carburetor* itsCarb;
public:
Car() {itsCarb=new Carburetor();}
virtual ~Car() {delete itsCarb;}
};
So, it's one way - we have a pointer to object as private member.
One can rewrite it to look like this:
class Car
{
private:
Carburetor itsCarb;
};
In that case we have an object itself as private member. (By the way, am I right to call this entity an object from the terminology point of view?)
In the second case it is not obligatory to implicitly call default constructor (if one need to call non-default constructor it's possible to do it in initializer list) and destructor. But it's not a big problem...
And of course in some aspects these two cases differ more appreciably. For example it's forbidden to call non-const methods of Carburetor instance from const methods of Car class in the second case...
Are there any "rules" to decide which one to use? Am I missing something?
In that case we have an object itself as private member. (By the way, calling this entity as object am I write from the terminology point of view?)
Yes you can say "an object" or "an instance" of the class.
You can also talk about including the data member "by value" instead of "by pointer" (because "by pointer" and "by value" is the normal way to talk about passing parameters, therefore I expect people would understand those terms being applied to data members).
Is there any "rules" to decide which one to use? Am I missed something?
If the instance is shared by more than one container, then each container should include it by pointer instead of value; for example if an Employee has a Boss instance, include the Boss by pointer if several Employee instances share the same Boss.
If the lifetime of the data member isn't the same as the lifetime of the container, then include it by pointer: for example if the data member is instantiated after the container, or destroyed before the container, or destroyed-and-recreated during the lifetime of the container, or if it ever makes sense for the data member to be null.
Another time when you must including by pointer (or by reference) instead of by value is when the type of the data member is an abstract base class.
Another reason for including by pointer is that that might allow you to change the implementation of the data member without recompiling the container. For example, if Car and Carburetor were defined in two different DLLs, you might want to include Carburetor by pointer: because then you might be able to change the implementation of the Carburetor by installing a different Carburetor.dll, without rebuilding the Car.dll.
I tend to prefer the first case because the second one requires you to #include Carburettor.h in Car.h.
Since Carburettor is a private member you should not have to include its definition somewhere else than in the actual Car implementation code. The use of the Carburettor class is clearly an implementation detail and external objects that use your Car object should not have to worry about including other non mandatory dependencies. By using a pointer you just need to use a forward declaration of Carburettor in Car.h.
Composition: prefer member when possible. Use a pointer when polymorphism is needed or when a forward declaration is used. Of course, without smart pointer, manual memory management is needed when using pointers.
If Carb has the same lifetime as Car, then the non-pointer form is better, in my opinion. If you have to replace the Carb in Car, then I'd opt for the pointer version.
Generally, the non-pointer version is easier to use and maintain.
But in some cases, you can't use it. For example if the car has multiple carburetors and you wish to put them in an array, and the Carburetor constructor requires an argument: you need to create them via new and thus store them as pointers.
This is a simplified example to illustrate the question:
class A {};
class B
{
B(A& a) : a(a) {}
A& a;
};
class C
{
C() : b(a) {}
A a;
B b;
};
So B is responsible for updating a part of C. I ran the code through lint and it whinged about the reference member: lint#1725.
This talks about taking care over default copy and assignments which is fair enough, but default copy and assignment is also bad with pointers, so there's little advantage there.
I always try to use references where I can since naked pointers introduce uncertaintly about who is responsible for deleting that pointer. I prefer to embed objects by value but if I need a pointer, I use auto_ptr in the member data of the class that owns the pointer, and pass the object around as a reference.
I would generally only use a pointer in member data when the pointer could be null or could change. Are there any other reasons to prefer pointers over references for data members?
Is it true to say that an object containing a reference should not be assignable, since a reference should not be changed once initialised?
My own rule of thumb :
Use a reference member when you want the life of your object to be dependent on the life of other objects : it's an explicit way to say that you don't allow the object to be alive without a valid instance of another class - because of no assignment and the obligation to get the references initialization via the constructor. It's a good way to design your class without assuming anything about it's instance being member or not of another class. You only assume that their lives are directly linked to other instances. It allows you to change later how you use your class instance (with new, as a local instance, as a class member, generated by a memory pool in a manager, etc.)
Use pointer in other cases : When you want the member to be changed later, use a pointer or a const pointer to be sure to only read the pointed instance. If that type is supposed to be copyable, you cannot use references anyway. Sometimes you also need to initialize the member after a special function call ( init() for example) and then you simply have no choice but to use a pointer. BUT : use asserts in all your member function to quickly detect wrong pointer state!
In cases where you want the object lifetime to be dependent on an external object's lifetime, and you also need that type to be copyable, then use pointer members but reference argument in constructor That way you are indicating on construction that the lifetime of this object depends on the argument's lifetime BUT the implementation use pointers to still be copyable. As long as these members are only changed by copy, and your type don't have a default constructor, the type should fullfil both goals.
Avoid reference members, because they restrict what the implementation of a class can do (including, as you mention, preventing the implementation of an assignment operator) and provide no benefits to what the class can provide.
Example problems:
you are forced to initialise the reference in each constructor's initialiser list: there's no way to factor out this initialisation into another function (until C++0x, anyway edit: C++ now has delegating constructors)
the reference cannot be rebound or be null. This can be an advantage, but if the code ever needs changing to allow rebinding or for the member to be null, all uses of the member need to change
unlike pointer members, references can't easily be replaced by smart pointers or iterators as refactoring might require
Whenever a reference is used it looks like value type (. operator etc), but behaves like a pointer (can dangle) - so e.g. Google Style Guide discourages it
Objects rarely should allow assign and other stuff like comparison. If you consider some business model with objects like 'Department', 'Employee', 'Director', it is hard to imagine a case when one employee will be assigned to other.
So for business objects it is very good to describe one-to-one and one-to-many relationships as references and not pointers.
And probably it is OK to describe one-or-zero relationship as a pointer.
So no 'we can't assign' then factor.
A lot of programmers just get used with pointers and that's why they will find any argument to avoid use of reference.
Having a pointer as a member will force you or member of your team to check the pointer again and again before use, with "just in case" comment. If a pointer can be zero then pointer probably is used as kind of flag, which is bad, as every object have to play its own role.
Use references when you can, and pointers when you have to.
In a few important cases, assignability is simply not needed. These are often lightweight algorithm wrappers that facilitate calculation without leaving the scope. Such objects are prime candidates for reference members since you can be sure that they always hold a valid reference and never need to be copied.
In such cases, make sure to make the assignment operator (and often also the copy constructor) non-usable (by inheriting from boost::noncopyable or declaring them private).
However, as user pts already commented, the same is not true for most other objects. Here, using reference members can be a huge problem and should generally be avoided.
As everyone seems to be handing out general rules, I'll offer two:
Never, ever use use references as class members. I have never done so in my own code (except to prove to myself that I was right in this rule) and cannot imagine a case where I would do so. The semantics are too confusing, and it's really not what references were designed for.
Always, always, use references when passing parameters to functions, except for the basic types, or when the algorithm requires a copy.
These rules are simple, and have stood me in good stead. I leave making rules on using smart pointers (but please, not auto_ptr) as class members to others.
Yes to: Is it true to say that an object containing a reference should not be assignable, since a reference should not be changed once initialised?
My rules of thumb for data members:
never use a reference, because it prevents assignment
if your class is responsible for deleting, use boost's scoped_ptr (which is safer than an auto_ptr)
otherwise, use a pointer or const pointer
I would generally only use a pointer in member data when the pointer could be null or could change. Are there any other reasons to prefer pointers over references for data members?
Yes. Readability of your code. A pointer makes it more obvious that the member is a reference (ironically :)), and not a contained object, because when you use it you have to de-reference it. I know some people think that is old fashioned, but I still think that it simply prevent confusion and mistakes.
I advise against reference data members becasue you never know who is going to derive from your class and what they might want to do. They might not want to make use of the referenced object, but being a reference you have forced them to provide a valid object.
I've done this to myself enough to stop using reference data members.
class C {
public
T x;
};
Is there an elegant way for the constructor of x to know implicitly in what instance of C it is constructing?
I've implemented such behavior with some dirty inelegant machinery. I need this for my sqlite3 wrapper. I don't like all wrappers I've seen, their API IMO ugly and inconvenient. I want something like this:
class TestRecordset: public Recordset {
public:
// The order of fields declarations specifies column index of the field.
// There is TestRecordset* pointer inside Field class,
// but it goes here indirectly so I don't have to
// re-type all the fields in the constructor initializer list.
Field<__int64> field1;
Field<wstring> field2;
Field<double> field3;
// have TestRecordset* pointer too so only name of parameter is specified
// in TestRecordset constructor
Param<wstring> param;
virtual string get_sql() {
return "SELECT 1, '1', NULL FROM test_table WHERE param=:PARAM";
}
// try & unlock are there because of my dirty tricks.
// I want to get rid of them.
TestRecordset(wstring param_value)
try : Recordset(open_database(L"test.db")), param("PARAM") {
param = param_value;
// I LOVE RAII but i cant use it here.
// Lock is set in Recordset constructor,
// not in TestRecordset constructor.
unlock(this);
fetch();
} catch(...) {
unlock(this);
throw;
}
};
I want to clarify the fact - it is a part of the working code. You can do this in C++. I just want to do it in a more nice way.
I've found a way to get rid of unlock and try block. I've remembered there is such a thing as thread local storage. Now I can write constructor as simple as that:
TestRecordset(wstring param_value):
Recordset(open_database(L"test.db")), param("PARAM") {
param = param_value;
fetch();
}
to dribeas:
My objective is to avoid redundant and tedious typing. Without some tricks behind the scene I will have to type for each Field and Param:
TestRecordset(wstring param_value): Recordset(open_database(L"test.db")), param(this, "PARAM"),
field1(this, 0), field2(this, 1), field3(this, 2) { ... }
It is redundant, ugly and inconvenient. For example, if I'll have to add new field in the
middle of SELECT I'll have to rewrite all the column numbers.
Some notes on your post:
Fields and Params are initialized by their default constructors.
Order of initializers in constructor is irrelevant. Fields are always initialized in order of their declaration. I've used this fact to track down column index for fields
Base classes are constructed first. So when Fields are constructed internal field list in Recordset are ready to use by Filed default constructor.
I CAN'T use RAII here. I need to acquire lock in Recorset constructor and release it obligatory in TestRecordset constructor after all Fields are constructed.
No. Objects aren't supposed to need to know where they're being used from in order to work. As far as x is concerned, it's an instance of T. That's it. It doesn't behave differently according to whether it's a member of class C, a member of class D, an automatic, a temporary, etc.
Furthermore, even if the T constructor did know about the instance of C, that instance of C would be incomplete since of course it has not finished construction yet, because its members haven't been constructed. C++ offers you plenty of chances to shoot yourself in the foot, but offering you a reference to an incomplete object in another class's constructor isn't one of them.
The only thing I can think of to approximate your code example is to do something like
#define INIT_FIELDS field1(this), field2(this), field3(this)
immediately after the list of fields, then use INIT_FIELDS in the initializer list and #undef it. It's still duplication, but at least it's all in one place. This will probably surprise your colleagues, however.
The other way to make sure you don't forget a field is to remove the zero-arg constructor from Field. Again, you still have to do the typing, but at least if you forget something the compiler will catch it. The non-DRY nature of initializer lists is, I think, something C++ just has to live with.
Adding on to One by One's answer, the actual question you should be asking is: "what is wrong with my solution design that it requires objects to know where they are instanciated?"
I don't think so.
Out of pure curiosity, why should it matter ? do you have a context in which this can be useful?
M.
I experiment with things like this in C# all the time - I use reflection to do it.
Consider getting a reflection or code generation library for C++ to help you do what you want to.
Now, I can't tell you how to find a good reflection or code generation library for C++, but that's a different question!
I am interested in your code. You comment that all fields plus the param attribute have pointers back into the TestRecordSet but that they don't need to be initialized? Or is it the object of the question, how to avoid having to pass the this pointers during construction?
If what you want is avoid adding all fields in the initialization list of your constructor, then it is a flawed objective. You should always initialize all your members in the initialization list and do so in the same order that they are declared in the class (this is not language enforced, but more of a globally learnt experience).
Your use of the try constructor block is just about he only recommended usage for that functionality (Anyone interested read GOTW#66) if it is indeed required. If the RecordSet member has been constructed (and thus the lock acquired) and something goes wrong afterwards in the constructor then [see quote below] the RecordSet will be destroyed and if it uses RAII internally it will free the lock, so I believe that the try/catch may not really be required.
C++03, 15.2 Exception Handling / Constructors and destructors
An object that is partially
constructed or partially destroyed
will have destructors executed for all
of its fully constructed subobjects,
that is, for subobjects for which the
constructor has completed execution
and the destructor has not yet begun
execution.