I have two lines of code I want explained a bit please. As much as you can tell me. Mainly the benefits of each and what is happening behind the scenes with memory and such.
Here are two structs as an example:
struct Employee
{
std::string firstname, lastname;
char middleInitial;
Date hiringDate; // another struct, not important for example
short department;
};
struct Manager
{
Employee emp; // manager employee record
list<Employee*>group; // people managed
};
Which is better to use out of these two in the above struct and why?
list<Employee*>group;
list<Employee>group;
First of all, std::list is a doubly-linked list. So both those statements are creating a linked list of employees.
list<Employee*> group;
This creates a list of pointers to Employee objects. In this case there needs to be some other code to allocate each employee before you can add it to the list. Similarly, each employee must be deleted separately, std::list will not do this for you. If the list of employees is to be shared with some other entity this would make sense. It'd probably be better to place the employee in a smart pointer class to prevent memory leaks. Something like
typedef std::list<std::shared_ptr<Employee>> EmployeeList;
EmployeeList group;
This line
list<Employee>group;
creates a list of Employee objects by value. Here you can construct Employee objects on the stack, add them to the list and not have to worry about memory allocation. This makes sense if the employee list is not shared with anything else.
One is a list of pointers and the other is a list of objects. If you've already allocated the objects, the first makes sense.
You probably want to use the second one, if you store the "people managed" to be persisted also in another location. To elaborate: if you also have a global list of companyEmployees you probably want to have pointers, as you want to share the object representing an employee between the locations (so that, for example, if you update the name the change is "seen" from both locations).
If instead you only want to know "why a list of structs instead of a list of pointers" the answer is: better memory locality, no need to de-allocate the single Employee objects, but careful that every assignement to/from a list node (for example, through an iterator and its * operator) copies the whole struct and not just a pointer.
The first one stores the objects by pointer. In this case you need to carefully document who owns the allocated memory and who's responsible for cleaning it up when done. The second one stores the objects by value and has full control of their lifespan.
Which one to use depends on context you haven't given in your question although I favor the second slightly as a default because it doesn't leave open the possibility of mismanaging your memory.
But after all that, carefully consider if list is actually the right container choice for you. Typically it's a low-priority container that satisfies very specific needs. I almost always favor vector and deque first for random access containers, or set and map for ordered containers.
If you do need to store pointers in the container, boost provides ptr-container classes that manage the memory for you, or I suggest storing some sort of smart pointer so that the memory is cleaned up automatically when the object isn't needed anymore.
A lot depends on what you are doing. For starters, do you really want
Manager to contain an Employee, rather than to be one: the classical
example of a manager (one of the classic OO examples) would be:
struct Manager : public Employee
{
list<Employee*> group;
};
Otherwise, you have the problem that you cannot put managers into the
group of another manager; you're limited to one level in the management
hierarchy.
The second point is that in order to make an intelligent decision, you
have to understand the role of Employee in the program. If Employee
is just a value: some hard data, typically immutable (except by
assignment of a complete Employee), then list<Employee> group is
definitely to be preferred: don't use pointers unless you have to. If
Employee is a "entity", which models some external entity (say an
employee of the firm), you would generally make it uncopyable and
unassignable, and use list<Employee*> (with some sort of mechanism to
inform the Manager when the employee is fired, and the pointed to
object is deleted). If managers are employees, and you don't want to
loose this fact when they are added to a group, then you have to use the
pointer version: polymorphism requires pointers or references to work
(and you can't have a container of references).
The two lists are good, but they will require a completely different handling.
list<Employee*>group;
is a list of pointers to objects of type Employee and you will store there pointers to objects allocated dynamically, and you will need to be particularly clear as to who will delete those objects.
list<Employee>group;
is a list of objects of type Employee; you get the benefit (and associated cost in terms of performance) of dealing with concrete instances that you do not need to memory manage yourself.
Specifically, one of the advantages of using std::list compared to a plain array, is that you can have a list of objects and avoid the cost and risks of dealing with dynamic memory allocation and pointers.
With a list of objects, you can do, e. g.
Employee a; // object allocated in the stack
list.push_back(a); // the list does a copy for you
Employee* b = new Employee....
list.push_back(*b); // the object pointed is copied
delete b;
With a list of pointers you are forced at using always dynamic allocation, in practice, or refer to object whose lifetime is longer than the list's (if you can guarantee it).
By using a std::list of pointers, you are more or less in the same situation as when using a plain array of pointers as far as memory management is concerned. The only advantage you get is that the list can grow dynamically without effort on your part.
I personally don't see much sense in using a list of pointers; basically, because I think that pointers should be used (always, when possible) through smart pointers. So, if you really need pointers, you will be better off, IMO, using a list of smart pointers provided by boost.
Use the first one if you're allocating or accessing the structures separately.
Use the second one if you'll only be allocating/accessing them through the list.
First one defines a list of pointers to objects, the second a list of objects.
The first version (with pointers) is preferred by most of the programmers.
The main reason is that STL is copying elements by value making sorting and internal reallocation more efficient.
You probably want to use unique_ptr<> or auto_ptr<> or shared_ptr<> rather then plain old * pointers. This goes some if not the whole way of having both the expected use without much of the memory issues with using non-heap objects...
Related
I have a conainter, lets say a std::list<int>, which I would like to share between objects. One of the objects is known to live longer than the others, so he will hold the container. In order to be able to access the list, the other objects may have a pointer to the list.
Since the holder object might get moved, I'll need to wrap the list with a unique_ptr:
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::list<int>& list; };
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects:
class LongLiveHolder { std::unique_ptr<NonExistentListNode<int>> back; };
class ShortLiveObject { NonExistentListNode<int>& back; };
, which would save me a redundant dereference when accessing the list, except that I would no longer have the full std::list interface to use with the shorter-lived object- just the node pointers.
Can I somehow get rid of this extra layer of indirection, while still having the std::list interface in the shorter-lived object?
Preface
You may be overthinking the cost of the extra indirection from the std::unique_ptr (unless you have a lot of these lists and you know that usages of them will be frequent and intermixed with other procedures). In general, I'd first trust my compiler to do smart things. If you want to know the cost, do performance profiling.
The main purpose of the std::unique_ptr in your use-case is just to have shared data with a stable address when other data that reference it gets moved. If you use the list member of the long-lived object multiple times in a single procedure, you can possibly help your compiler to help you (and also get some nicer-to-read code) when you use the list through the long-lived object by making a variable in the scope of the procedure that stores a reference to the std::list pointed to by the std::unique_ptr like:
void fn(LongLiveHolder& holder) {
auto& list {holder.list.get()};
list.<some_operation_1>(...);
list.<some_operation_2>(...);
list.<some_operation_3>(...);
}
But again, you should inspect the generated machine code and do performance profiling if you really want to know what kind of difference it makes.
If Context Permits, Write your own List
You said:
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects: [...]
Considering Changes in what is the First Node
What if the first node of the list is allowed to be deleted? What if a new node is allowed to be inserted at the beginning of the list? You'd need a very specific context for those to not be requirements. What you want in your short-lived object is a view abstractions which supports the same interface as the actual list but just doesn't manage the lifetime of the list contents. If you implement the view abstraction as a pointer to the list's first node, then how will the view object know about changes to what the "real"/lifetime-managing list considers to be the first node? It can't- unless the lifetime-managing list keeps an internal list of all views of itself which are alive and also updates those (which itself is a performance and space overhead), and even then, what about the reverse? If the view abstraction was used to change what's considered the first node, how would the lifetime-managing list know about that change? The simplest, sane solution is to have an extra level of indirection: make the view point to the list instead of to what was the list's first node when the view was created.
Considering Requirements on Time Complexity of getting the list size
I'm pretty sure a std::list can't just hold pointers to front and back nodes. For one thing, since c++11 requires that std::list::size() is O(1), std::list probably has to keep track of its size at all times in a counter member- either storing it in itself, or doing some kind of size-tracking in each node struct, or some other implementation-defined behaviour. I'm pretty sure the simplest and most performant way to have multiple moveable references (non-const pointers) to something that needs to do this kind of bookkeeping is to just add another level of indirection.
You could try to "skip" the indirection layer required by the bookkeeping for specific cases that don't require that information, which is the iterators/node-pointers approach, which I'll comment on later. I can't think of a better place or way to store that bookkeeping other than with the collection itself. Ie. If the list interface has requirements that require such bookkeeping, an extra layer of indirection for each user of the list implementation has a very strong design rationale.
If Context Permits
If you don't care about having O(1) to get the size of your list, and you know that what is considered the first node will not change for the lifetime of the short-lived object, then you can write your own List class list-view class and make your own context-specific optimizations. That's one of the big selling-points of languages like C++: You get a nice standard library that does commonly useful things, and when you have a specific scenario where some features of those tools aren't required and are resulting in unnecessary overhead, you can build your own tool/abstraction (or possibly use someone else's library).
Commentary on std::unique_ptr + reference
Your first snippet works, but you can probably get some better implicit constructors and such for SortLiveObject by using std::reference_wrapper, since the default implicity-declared copy-assignment and default-construct functions get deleted when there's a reference member.
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::reference_wrapper<std::list<int>> list; };
Commentary on std::shared_ptr + std::weak_ref
Like #Adrian Maire suggested, std::shared_ptr in the longer-lived, object which might move while the shorter-lived object exists, and std::weak_ptr in the shorter-lived object is a working approach, but it probably has more overhead (at least coming from the ref-count) than using std::unique_ptr + a reference, and I can't think of any generalized pros, so I wouldn't suggest it unless you already had some other reason to use a std::shared_ptr. In the scenario you gave, I'm pretty sure you do not.
Commentary on Storing iterators/node-pointers in the short-lived object
#Daniel Langr already commented about this, but I'll try to expand.
Specifically for std::list, there is a possible standard-compliant solution (with several caveats) that doesn't have the extra indirection of the smart pointer. Caveats:
You must be okay with only having an iterator interface for the shorter-lived object (which you indicated that you are not).
The front and back iterators must be stable for the lifetime of the shorter-lived object. (the iterators should not be deleted from the list, and the shorter-lived object won't see new list entries that are pushed to the front or back by someone using the longer-lived object).
From cppreference.com's page for std::list's constructors:
After container move construction (overload (8)), references, pointers, and iterators (other than the end iterator) to other remain valid, but refer to elements that are now in *this. The current standard makes this guarantee via the blanket statement in [container.requirements.general]/12, and a more direct guarantee is under consideration via LWG 2321.
From cppreference.com's page for std::list:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
But I am not a language lawyer. I could be missing something important.
Also, you replied to Daniel saying:
Some iterators get invalid when moving the container (e.g. insert_iterator) #DanielLangr
Yes, so if you want to be able to make std::input_iterators, use the std::unique_ptr + reference approach and construct short-lived std::input_iterators when needed instead of trying to store long-lived ones.
If the list owner will be moved, then you need some memory address to share somehow.
You already indicated the unique_ptr. It's a decent solution if the non-owners don't need to save it internally.
The std::shared_ptr is an obvious alternative.
Finally, you can have a std::shared_ptr in the owner object, and pass std::weak_ptr to non-owners.
I would like to ask about my approach to using pointers raw pointers without allocating any memory using pointers. I am working on an application, that is simulating classical cashdesk. So I have a class CashDesk, which is containing vectors of Items and vector of Orders, which are classes to represent items and orders. Furthermore, I want the Order class to contain a vector, which would be a vector of pointers to Item – I don't want to store the object multiple times in different orders, because it makes no sense to me. Through the pointers in Order, I only want to be able to access properties of the class Item, there is no allocating of memory using the pointers.
Simplified code:
class CashDesk {
vector<Item> items;
vector<Order> orders;
}
class Order {
vector<Item*> ItemsInOrder;
}
Class Item containing only structured data – information about the Item.
I create all objects at the level of the CashDesk class – create instance of Item when needed and push it to items vector.
I have been told that I should avoid using raw pointers unless there is no another option. The important thing is that I don't use any memory allocation using pointers – really using the pointer in terms of pointing at the object and accessing it's properties. Should I rather use something like unique_ptr, or completely different approach?
Thanks for any response.
I have been told that I should avoid using raw pointers unless there is no another option.
You have been told something subtly wrong. You should avoid owning raw pointers, but non-owning raw pointers are perfectly fine.
You will have to ensure that the elements of Order::itemsInOrder aren't invalidated by operations on CashDesk::items, but that co-ordination should be within the private parts of CashDesk.
You could be more explicit about the lack of ownership semantic, by using std::vector<Item>::iterator in place of Item *, but that doesn't change any behaviour (a conforming implementation may implement std::vector<Item>::iterator as an alias of Item *)
I'm trying to understand when I need to allocate an array of an object that each pointer to some object for example array of Student that point to Student:
Student** db = new Student*[size]
when do I need to use it? I know that is a general question, but I'm trying to solve some Exam that combines inheritance, and in some class, one of the data member they declare it as I said above.
in my solution i wrote:
Student * db = new Student[size];
thanks.
TL;DR version:
Use std::vector<std::unique_ptr<Student>> db.
Explanation
Student** db = new Student*[size]
could be used to represent an array of classes derived from Student.
eg:
Student** db = new Student*[size];
db[0] = new Grad_Student();
db[1] = new Coop_Student();
db[2] = new Elementary_Student();
If you elect the second option
Student * db = new Student[size];
db[0] = Grad_Student();
db[1] = Coop_Student();
db[2] = Elementary_Student();
you save a lot of pesky manual memory management by directly holding Students rather than pointers to Students, but Object Slicing will turn the derived Students into plain old Students. A box sized and shaped to fit a Student can only store a Student, so all of the additional features of, for example, the Grad_Student assigned to db[0] will be lost. Only by storing a reference to the Grad_Student can the Grad_Student's extensions be preserved. You just have to remember that the Grad_Student is actually stored somewhere else.
Sounds good right? It is until you look at all of the dynamic allocations you have to make sure are cleaned up. Memory management is one of the hardest things to get right in C++, and one of the best ways to manage memory management is through Resource Allocation Is Initialization or RAII. std::vector and std::unique_ptr are fabulous examples of RAII in action.
vector is a dynamic array all nicely wrapped up inside a class that handles virtually every aspect of list management right down to adding, removing, resizing, and making sure everything gets cleaned up. unique_ptr is a Smart Pointer that ensures exactly one owner of a resource, and this owner will clean up the resource when it is destroyed. The result, std::vector<std::unique_ptr<Student>> will allow you to add, remove, access, and move any Students without any direct intervention. This allows you to write simpler code. Simpler code is less likely to have bugs. Fewer bugs means more leisure time and happier clients. Everybody wins.
Suppose you already have a collection, for example a linked list of Students which is in order by Student ID. You want to sort them by Student last name. Instead of changing your linked list, or messing up its order, you just allocate an array of pointers and sort that. Your original list remains intact but you can do fast binary searches by last name using your array.
I have a vector of journeys and a vector of locations. A journey is between two places.
struct Data {
std::vector<Journey> m_journeys;
std::vector<Locations> m_locations;
};
struct Journey {
?? m_startLocation;
?? m_endLocation;
};
How can I create the relationship between each journey and two locations?
I thought I could just store references/pointers to the start and end locations, however if more locations are added to the vector, then it will reallocate storage and move all the locations elsewhere in memory, and then the pointers to the locations will point to junk.
I could store the place names and then search the list in Data, but that would require keeping a reference to Data (breaking encapsulation/SRP), and then a not so efficient search.
I think if all the objects were created on the heap, then shared_ptr could be used, (so Data would contain std::vector<std::shared_ptr<Journey>>), then this would work? (it would require massive rewrite so avoiding this would be preferable)
Is there some C++/STL feature that is like a pointer but abstracts away/is independent of memory location (or order in the vector)?
No, there isn't any "C++/STL feature that is like a pointer but abstracts away/is independent of memory location".
That answers that.
This is simply not the right set of containers for such a relationship between classes. You have to pick the appropriate container for your objects first, instead of selecting some arbitrary container first, and then trying to figure out how to make it work with your relationship.
Using a vector of std::shared_ptrs would be one option, just need to watch out for circular references. Another option would be to use std::list instead of std::vector, since std::list does not reallocate when it grows.
If each Locations instance has a unique identifier of some kind, using a std::map, and then using that location identifier to refer to a location, and then looking it up in the map. Although a std::map also doesn't reallocate upon growth, the layer of indirection offers some value as well.
I'd say make a vector<shared_ptr<Location>>for your index of locations, and Journey would contain two weak_ptr<Location>.
struct Data {
std::vector<Journey> m_journeys;
std::vector<std::shared_ptr<Location>> m_locations;
};
struct Journey {
std::weak_ptr<Location> m_startLocation;
std::weak_ptr<Location> m_endLocation;
};
std::weak_ptr can dangle and that's exactly what you want. :)
The concern is that one could access a Journey containing a deleted Location. A weak pointer provides an expired() method that can tell you if the data of the parent shared pointer (that would be in your m_locations vector) still exists.
Accessing data from a weak pointer is safe, and will require the use of the lock() method.
Here is a great example of how one usually uses a weak pointer:
http://en.cppreference.com/w/cpp/memory/weak_ptr/lock
See this example.
an University class has a Director and many student So my class will be like this
a)
class University {
Director d;
Student list[1000];
};
or
b)
class University {
Director* d;
Student* list[1000];
};
My problem is how to decide whether class attributes should be pointer or value.
Most all other answers focus on the detail of heap vs. direct containment (or provide no information at all, like use pointers when you want pointers... Rather than focusing on the details, consider the overall design of the application.
The first question would be about ownership. In your program, are those students and director owned by the class? Or do they exist outside of the class scope. In most simple applications, the objects might only exist inside the class, but in other more complex designs, the students might belong to the school, and only be referenced in the class (or the director might also teach some courses to other classes). If the class owns the objects, the composition will be the best approach: hold the director directly as a member, and the students inside a container that is directly held by the class (I would recommend a vector, which is the safe choice for most cases).
If the objects don't belong to the class, then you will rather use aggregation. Whoever owns the object will have to manage the lifetimes and decide how to store the real objects and the class would only hold references (in the general sense) to those objects. Things get more complicated as there are more choices. If ownership can be transferred, then you would dynamically allocate the objects and hold pointers, where you should read smart pointers so that memory will be managed for you.
If ownership does not change and the lifetime of the students/director are guaranteed to extend beyond the lifetime of the class, you could use references. In particular for the director. In the case of the students, it will be more complex as you cannot have containers of plain references, so the solution might still be pointers there, a vector of pointers. Another issue with references is that they cannot be reseated, which means that if you hold a reference to the director, the director of the class will be fixed for the whole lifetime of the class and you won't be able to replace her.
Design is somehow complicated and you will learn with experience, but hopefully this will provide a quick start onto your problem.
The issue here is: Where is the storage for these member variables? Sometimes it makes sense that a piece of data was allocated somewhere else and used other places. In that case a pointer may make sense (rather than using a copy constructor). However, usually that isn't the case (especially with encapsulation). Then you want to store the member data in the class. In such a case, and your example looks like it is, you don't want to use a pointer.
how to decide whether class attributes should be pointer or value
I would mostly go for value (i.e. object). In some special cases, I will choose a pointer (may be a smart one!). For your case, below would suffice:
class University {
Director d;
std::vector<Student> list;
public:
University () { list.reserve(1000); }
};
The advantage of having an object is that you don't have to do your own garbage collection as the resource management will be automatic.
Pointers can be used, when you want to change the ownership of the resource (similar to shallow copy), at the same time avoiding expensive copies created during copy c-tor or assignment. In all other cases, use objects (i.e. value) for composition.
Well it depends. Pointers should be used when you want to add stuff to the heap, while this means you have a bit more freedom in when/how you allocate memory, you have to add more code to avoid memory leaks: ie destructors and deleting stuff. It also allows you to easily modify the values from other functions/classes without having to pass a reference, just pass it in its pointer form.
One obvious situation when pointers are totally needed is in a binary tree node object, since it must contain objects of the same type as itself, it must use pointers to those objects. IE:
struct Node{
Node* left;
Node* right;
//Other stuff
};
In many situations however, its up to your own discretion. Just be responsible for your pointers if you use them.
Actually there are three options
1. Object
2. Reference
3. Pointer
It's part of the design/architect .. on what to use for what object.
Mostly .. the deciding criteria will be, lifecycles of the objects and the containers.
In both cases the class attributes are being stored by value, it just happens that in the second case those values are pointers.
Use pointers when you want pointers, use non-pointers when you don't want pointers. This entirely depends on the desired semantics of the class that you are writing.
This is what i would go for:
class University {
Director d;
Student **list;
};
Even though its much of a personal matter. i think using pointer to pointer is better in this case if you know what you are playing with!
I dont think a pointer array is a good choice. If you dont want pointers then use Value