I need to use lists for my program and needed to decide if I use std::vector or std::list.
The problem with vector is that there is no remove method and with list that there is no operator []. So I decided to write my own class extending std::list and overloading the [] operator.
My code looks like this:
#include <list>
template <class T >
class myList : public std::list<T>
{
public:
T operator[](int index);
T operator[](int & index);
myList(void);
~myList(void);
};
#include "myList.h"
template<class T>
myList<T>::myList(void): std::list<T>() {}
template<class T>
myList<T>::~myList(void)
{
std::list<T>::~list();
}
template<class T>
T myList<T>::operator[](int index) {
int count = 0;
std::list<T>::iterator itr = this->begin();
while(count != index)itr++;
return *itr;
}
template<class T>
T myList<T>::operator[](int & index) {
int count = 0;
std::list<T>::iterator itr = this->begin();
while(count != index)itr++;
return *itr;
}
I can compile it but I get a linker error if I try to use it. Any ideas?
Depending on your needs, you should use std::vector (if you need often appends/removes at the end, and random access), or std::deque (if you need often appends/removes at the end or at the beginning, and your dataset is huge, and still want random access). Here is a good picture showing you how to make the decision:
(source: adrinael.net)
Given your original problem statement,
I need to use lists for my program and needed to decide if I use std::vector or std::list. The problem with vector is that there is no remove method and with list that there is no operator [].
there is no need to create your own list class (this is not a wise design choice anyway, because std::list does not have a virtual destructor, which is a strong indication that it is not intended to be used as a base class).
You can still achieve what you want using std::vector and the std::remove function. If v is a std::vector<T>, then to remove the value value, you can simply write:
#include <vector>
#include <algorithm>
T value = ...; // whatever
v.erase(std::remove(v.begin(), v.end(), value), v.end());
All template code should be put in header file. This fill fix linking problems (that's the simplest way).
The reason it happens is because compilers compiles every source (.cc) file separately from other files. On the other hand it needs to know what code exactly it needs to create (i.e. what is the T in template is substituted with), and it has no other way to know it unless the programmer tells it explicitly or includes all the code when template instantiation happens. I.e. when mylist.cc is compiled, it knows nothing about mylist users and what code needs to be created. On the other hand if listuser.cc is compiled, and all the mylist code is present, the compiler creates needed mylist code. You can read more about it in here or in Stroustrup.
Your code has problems, what if user requests negative or too large (more than amount of elements in the list). And i didn't look too much.
Besides, i don't know how u plan to use it, but your operator[] is O(N) time, which will probably easily lead to O(N*N) loops...
Vectors have the erase method that can remove elements. Is that not sufficient?
In addition to other excellent comments, the best way to extend a standard container is not by derivation, but writing free functions. For instance, see how Boost String Algorithms can be used to extend std::string and other string classes.
You have to move all your template code into header.
The obvious stuff has already been described in details:
But the methods you choose to implement??
Destructor.
Not required compiler will generate that for you.
The two different versions of operator[] are pointless
Also you should be uisng std::list::size_type as the index
Unless you intend to support negative indexes.
There are no const versions of operator[]
If you are going to implement [] you should also do at()
You missed out all the different ways of constructing a list.
Containers should define several types internally
see http://www.sgi.com/tech/stl/Container.html
There is no need to call destructor of std::list , because you already derive from std::list when destructor called for myList automatically std::list destructor will be called.
Related
Can we overload the push_back() method in std::vector to allow non-duplicate elements? I know std::set and std::unordered_set are supposed to avoid duplicate elements, but std::set sorts the elements and std::unordered_set stores the elements in no particular order. I need to retrieve the elements in the order they are inserted, while ensuring duplicate elements are not inserted.
Edit: There's a possible duplicate for this question here. The best solution to this duplicate proposes to have an auxiliary data structure and another custom method "add". This doesn't look good for me since(I'll put it in a separate documentation) the users inserting data in std::vector rarely refer to the documentation for any custom functions. If there's no efficient way though, this can be a last resort.
Many people advise against it, but it seems there's some kind of urban legend going around that doing so will cause the universe to undergo vacuum decay and reality as we know it will dissolve.
You can publicly inherit from std::vector. But you have to think about what you can do with that.
If you inherit from vector, it is highly recommended that you don't add any data members to it. This can cause object slicing (google "c++ object slicing".) You also need to keep in mind that vector is not using virtual functions. That means you cannot override member functions. You can only shadow them, so it's not guaranteed that it will always be your push_back() function that gets called. The original will get called if you pass an object of your class to something that takes a reference to a vector, for example.
So in the end, you'd need to add a push_back_unique() function instead. But that in turns means that can be served by a simple free function instead. So inheriting vector isn't needed. This of course means there's never a guarantee that the elements in the vector will be unique. Other code might use push_back() instead somewhere.
Inheriting vector makes sense if you want to add completely new convenience functions that don't impose or lift any restrictions that vector has. If you want something that looks like a vector but really isn't (because it has different behavior and/or restrictions), you should implement your own type that delegates the container functionality to vector by either inheriting privately from it, or by having it as a private data member, and then replicate the vector API through public wrapper functions.
But this is very tedious to implement. Usually, you don't really need all the API from vector. So I'd say just write a smaller class around vector that only provides the functionality you need. And that functionality sounds like it's going to be pretty much read-only, since allowing write access to the elements allows for setting an element to the same value as another, breaking the container's uniqueness. So you could do something like:
template<typename T>
class UniqueVector
{
public:
void push_back(T&& elem)
{
if (std::find(vec_.begin(), vec_.end(), elem) == vec_.end()) {
vec_.push_back(std::forward(elem));
}
}
const T& operator[](size_t index) const
{
return vec_[index];
}
auto begin() const
{
return vec_.cbegin();
}
auto end() const
{
return vec_.cend();
}
private:
std::vector<T> vec_;
};
If you still want to allow write access to individual elements, then you can provide non-const functions that check if the value that is passed is already in the vector. Like:
void assign_if_unique(size_t index, T&& value)
{
if (std::find(vec_.begin(), vec_.end(), value) == vec_.end()) {
vec_[index] = std::forward(value);
}
}
This is a minimal example. You should obviously add the functions you actually want. Like size(), empty(), and whatever else you need.
You should first define a free function1 to implement your feature:
template<class T>
std::vector<T>&
push_back_unique(std::vector<T>& dest, T const& src)
{ /* ... */ }
If you use this a lot, and if make sense regarding your program, you might want to define an operator to do so:
template<class T>
std::vector<T>& operator<<(std::vector<T>& dest, T const& src)
{ return push_back_unique(dest, src); }
This allows:
std::vector<int> data;
data << 5 << 8 << 13 << 5 << 21;
for (auto n : data) std::cout << n << " "; // prints 5 8 13 21
1) This is because inheriting from standard containers is often bad practice and brings pitfalls.
I have meet such a question.
I need to design a interface class, which looks like to be
struct IIDs
{
....
const std::set<int>& getAllIDs() = 0; //!< I want the collection of int to be sorted.
}
void foo()
{
const std::set<int>& ids = pIIDs->getAllIDs();
for(std::set<int>::const_iterator it = ids.begin();....;..) {
// do something
}
}
I think that return a std's container is a bit of inappropriate, for that it will force the implement to use a std::set to store the value of IDs, But If I write it as follow :
struct IIDs
{
....
int count() const = 0;
int at(int index) = 0; //!< the itmes should be sorted
}
void foo()
{
for (int i = 0; i < pIIDs->count(); ++i) {
int val = pIIDs->at(u);
...
}
}
I found that none of the std's containers could provide those requests:
the complexity of index lookup needed to less or equal than O(log n).
the complexity of insertion need to less or equal than O(log n).
the items must be sorted.
So I just have to use the example.1, Is those can be acceptable?
STL containers and template code in general should never be used across a DLL boundary.
The thing you have to keep in mind when returning complex types like STL containers is that if your call ever crosses the boundary between two different DLLs (or a DLL and an application) running different memory managers your application will most likely crash spectacularly.
The templates that make up the STL code will be executed within the implementation DLL, creating all the memory used by the container there. Later when it leaves scope in your calling code, your own memory manager will attempt to deallocate memory it doesn't own, resulting in a crash.
If you know your code won't cross DLL boundaries, and will only ever be called in the context of a single memory manager, then you're fine as far as memory management is concerned.
However, even in cases where you're only returning references, such as your example above, where the lifetime of the container would be entirely managed by the interface implementation code, unless you know that the exact same version of the STL and the exact same compiler and linker settings were used for compiling the implementation as the caller, you're asking for trouble.
The problem i see is you are returning the collection by const references, that mean that you have a member of that collection type and are returning a reference to it, if you are returning a local variable to the function (invalid memory access problems).
If it's a member variable is better provide access to begin and end iterator. If is local variable you could returned by value (C++11 should optimize and no copy anything). If it's DLL boundary try for all mean not use any C++ types, only C types.
In terms of design, and for good generic code, prefer the STL way: return iterators, leaving the container type an implementation detail of IIDs, and hide your types with typdefs
struct IIDs
{
typedef std::set<int> Container;
typedef Container::iterator IDIterator;
// We only expose iterators to the data
IDIterator begin(); //!< I want the collection of int to be sorted.
IDIterator end();
// ...
};
There are various approaches:
if you want to minimise the coupling of client code on the IIDs implementation and ensure iteration is completed while the IIDs object still exists, then use a visitor pattern: the calling code just has to supply some function to be called for each of the member elements in turn and is not responsible for the iteration itself
Visitor example:
struct IIDs
{
template <typename T>
void visit(T& t)
{
for (int i : ids_) t(i);
}
...
private:
std::set<int> ids_;
};
if you want to give the caller more freedom to mix other code in with the container traversal, and have multiple concurrent independent traversals, then provide iterators, but be aware that the client code could keep an iterator hanging around longer than the IIDs object itself - you may or may not want to handle that scenario gracefully
I was surprised to find out the vector::erase move elements on calling erase . I thought it would swap the last element with the "to-be-deleted" element and reduce the size by one. My first reaction was : "let's extend std::vector and over-ride erase()" . But I found in many threads like " Is there any real risk to deriving from the C++ STL containers? ", that it can cause memory leaks. But, I am not adding any new data member to vector. So there is no additional memory to be freed. Is there still a risk?
Some suggest that we should prefer composition over inheritance. I can't make sense of this advice in this context. Why should I waste my time in the "mechanical" task of wrapping every function of the otherwise wonderful std::vector class.? Inheritance indeed makes the most sense for this task - or am I missing something?
Why not just write a standalone function that does what you want:
template<typename T>
void fast_erase(std::vector<T>& v, size_t i)
{
v[i] = std::move(v.back());
v.pop_back();
}
All credit to Seth Carnegie though. I originally used "std::swap".
Delicate issue. The first guideline you're breaking is: "Inheritance is not for code reuse". The second is: "Don't inherit from standard library containers".
But: If you can guarantee, that nobody will ever use your unordered_vector<T> as a vector<T> you're good. However, if somebody does, the results may be undefined and/or horrible, regardless of how many members you have (it may seem to work perfectly but nevertheless be undefined behaviour!).
You could use private inheritance, but that would not free you from writing wrappers or pulling member functions in with lots of using statements, which would almost be as much code as composition (a bit less, though).
Edit: What I mean with using statements is this:
class Base {
public:
void dosmth();
};
class Derived : private Base {
public:
using Base::dosmth;
};
class Composed {
private:
Base base;
public:
void dosmth() {return base.dosmth(); }
};
You could do this with all member functions of std::vector. As you can see Derived is significantly less code than Composed.
The risk of inheritance is in the following example:
std::vector<something> *v = new better_vector<something>();
delete v;
That would cause problems because you deleted a pointer to a base class with no virtual destructor.
However if you always delete a pointer to your class like:
better_vector<something> *v = new better_vector<something>();
delete v;
Or don't allocate it on the heap there is no danger. just don't forget to call the parent destructor in your destructor.
I thought it would swap the last element with the "to-be-deleted"
element and reduce the size by one.
vector::erase maintains order of elements while moving last element to erased element and reduce the size by one does not. Since vector implements array, there is no O(1) way to maintain order of elements and to erase at the same time (unless you remove the last element).
If maintaining order of elements is not important than your solution is fine, otherwise, you better use other containers (for example list, which implements doubly-linked list).
So I have some legacy code which I would love to use more modern techniques. But I fear that given the way that things are designed, it is a non-option. The core issue is that often a node is in more than one list at a time. Something like this:
struct T {
T *next_1;
T *prev_1;
T *next_2;
T *prev_2;
int value;
};
this allows the core have a single object of type T be allocated and inserted into 2 doubly linked lists, nice and efficient.
Obviously I could just have 2 std::list<T*>'s and just insert the object into both...but there is one thing which would be way less efficient...removal.
Often the code needs to "destroy" an object of type T and this includes removing the element from all lists. This is nice because given a T* the code can remove that object from all lists it exists in. With something like a std::list I would need to search for the object to get an iterator, then remove that (I can't just pass around an iterator because it is in several lists).
Is there a nice c++-ish solution to this, or is the manually rolled way the best way? I have a feeling the manually rolled way is the answer, but I figured I'd ask.
As another possible solution, look at Boost Intrusive, which has an alternate list class a lot of properties that may make it useful for your problem.
In this case, I think it'd look something like this:
using namespace boost::intrusive;
struct tag1; struct tag2;
typedef list_base_hook< tag<tag1> > base1;
typedef list_base_hook< tag<tag2> > base2;
class T: public base1, public base2
{
int value;
}
list<T, base_hook<base1> > list1;
list<T, base_hook<base2> > list2;
// constant time to get iterator of a T item:
where_in_list1 = list1.iterator_to(item);
where_in_list2 = list2.iterator_to(item);
// once you have iterators, you can remove in contant time, etc, etc.
Instead of managing your own next/previous pointers, you could indeed use an std::list. To solve the performance of remove problem, you could store an iterator to the object itself (one member for each std::list the element can be stored in).
You can extend this to store a vector or array of iterators in the class (in case you don't know the number of lists the element is stored in).
I think the proper answer depends on how performance-critical this application is. Is it in an inner loop that could potentially cost the program a user-perceivable runtime difference?
There is a way to create this sort of functionality by creating your own classes derived from some of the STL containers, but it might not even be worth it to you. At the risk of sounding tiresome, I think this might be an example of premature optimization.
The question to answer is why this C struct exists in the first place. You can't re-implement the functionality in C++ until you know what that functionality is. Some questions to help you answer that are,
Why lists? Does the data need to be in sequence, i.e., in order? Does the order mean something? Does the application require ordered traversal?
Why two containers? Does membership in the container indicated some kind of property of the element?
Why a double-linked list specifically? Is O(1) insertion and deletion important? Is reverse-iteration important?
The answer to some or all of these may be, "no real reason, that's just how they implemented it". If so, you can replace that intrusive C-pointer mess with a non-intrusive C++ container solution, possibly containing shared_ptrs rather than ptrs.
What I'm getting at is, you may not need to re-implement anything. You may be able to discard the entire business, and store the values in proper C++ containers.
How's this?
struct T {
std::list<T*>::iterator entry1, entry2;
int value;
};
std::list<T*> list1, list2;
// init a T* item:
item = new T;
item->entry1 = list1.end();
item->entry2 = list2.end();
// add a T* item to list 1:
item->entry1 = list1.insert(<where>, item);
// remove a T* item from list1
if (item->entry1 != list1.end()) {
list1.remove(item->entry1); // this is O(1)
item->entry1 = list1.end();
}
// code for list2 management is similar
You could make T a class and use constructors and member functions to do most of this for you. If you have variable numbers of lists, you can use a list of iterators std::vector<std::list<T>::iterator> to track the item's position in each list.
Note that if you use push_back or push_front to add to the list, you need to do item->entry1 = list1.end(); item->entry1--; or item->entry1 = list1.begin(); respectively to get the iterator pointed in the right place.
It sounds like you're talking about something that could be addressed by applying graph theory. As such the Boost Graph Library might offer some solutions.
list::remove is what you're after. It'll remove any and all objects in the list with the same value as what you passed into it.
So:
list<T> listOne, listTwo;
// Things get added to the lists.
T thingToRemove;
listOne.remove(thingToRemove);
listTwo.remove(thingToRemove);
I'd also suggest converting your list node into a class; that way C++ will take care of memory for you.
class MyThing {
public:
int value;
// Any other values associated with T
};
list<MyClass> listOne, listTwo; // can add and remove MyClass objects w/o worrying about destroying anything.
You might even encapsulate the two lists into their own class, with add/remove methods for them. Then you only have to call one method when you want to remove an object.
class TwoLists {
private:
list<MyClass> listOne, listTwo;
// ...
public:
void remove(const MyClass& thing) {
listOne.remove(thing);
listTwo.remove(thing);
}
};
It's been a while since I programmed in C++, and after coming from python, I feel soooo in a straight jacket, ok I'm not gonna rant.
I have a couple of functions that act as "pipes", accepting a list as input, returning another list as output (based on the input),
this is in concept, but in practice, I'm using std::vector to represent the list, is that acceptable?
further more, I'm not using any pointers, so I'm using std::vector<SomeType> the_list(some_size); as the variable, and returning it directly, i.e. return the_list;
P.S. So far it's all ok, the project size is small and this doesn't seem to affect performance, but I still want to get some input/advice on this, because I feel like I'm writing python in C++.
The only thing I can see is that your forcing a copy of the list you return. It would be more efficient to do something like:
void DoSomething(const std::vector<SomeType>& in, std::vector<SomeType>& out)
{
...
// no need to return anything, just modify out
}
Because you pass in the list you want to return, you avoid the extra copy.
Edit: This is an old reply. If you can use a modern C++ compiler with move semantics, you don't need to worry about this. Of course, this answer still applies if the object you are returning DOES NOT have move semantics.
If you really need a new list, I would simply return it. Return value optimization will take care of no needless copies in most cases, and your code stays very clear.
That being said, taking lists and returning other lists is indeed python programming in C++.
A, for C++, more suitable paradigm would be to create functions that take a range of iterators and alter the underlying collection.
e.g.
void DoSomething(iterator const & from, iterator const & to);
(with iterator possibly being a template, depending on your needs)
Chaining operations is then a matter of calling consecutive methods on begin(), end().
If you don't want to alter the input, you'd make a copy yourself first.
std::vector theOutput(inputVector);
This all comes from the C++ "don't pay for what you don't need" philosophy, you'd only create copies where you actually want to keep the originals.
I'd use the generic approach:
template <typename InIt, typename OutIt>
void DoMagic(InIt first, InIt last, OutIt out)
{
for(; first != last; ++first) {
if(IsCorrectIngredient(*first)) {
*out = DoMoreMagic(*first);
++out;
}
}
}
Now you can call it
std::vector<MagicIngredients> ingredients;
std::vector<MagicResults> result;
DoMagic(ingredients.begin(), ingredients.end(), std::back_inserter(results));
You can easily change containers used without changing the algorithm used, also it is efficient there's no overhead in returning containers.
If you want to be really hardcore, you could use boost::tuple.
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
But since it seems all your objects are of a single, non-polymorphic type, then the std::vector is all well and fine.
If your types were polymorphic (inherited classes of a base class) then you'd need a vector of pointers, and you'd need to remember to delete all the allocated objects before throwing away your vector.
Using a std::vector is the preferably way in many situations. Its guaranteed to use consecutive memory and is therefor pleasant for the L1 cache.
You should be aware of what happends when your return type is std::vector. What happens under the hood is that the std::vector is recursive copied, so if SomeType's copy constructor is expensive the "return statement" may be a lengthy and time consuming operation.
If you are searching and inserting a lot in your list you could look at std::set to get logarithmic time complexity instead of linear. (std::vectors insert is constant until its capacity is exceeded).
You are saying that you have many "pipe functions"... sounds like an excellent scenario for std::transform.
Another problem with returning a list of objects (opposed to working on one or two lists in place, as BigSandwich pointed out), is if your objects have complex copy constructors, those will called for each element in the container.
If you have 1000 objects each referencing a hunk of memory, and they copy that memory on Object a, b; a=b; that's 1000 memcopys for you, just for returning them contained in a container. If you still want to return a container directly, think about pointers in this case.
It works very simple.
list<int> foo(void)
{
list<int> l;
// do something
return l;
}
Now receiving data:
list<int> lst=foo();
Is fully optimal because compiler know to optimize constructor of lst well. and
would not cause copies.
Other method, more portable:
list<int> lst;
// do anything you want with list
lst.swap(foo());
What happens: foo already optimized so there is no problem to return the value. When
you call swap you set value of lst to new, and thus do not copy it. Now old value
of lst is "swapped" and destructed.
This is the efficient way to do the job.