Overloading push_back() in vector to allow non-duplicate elements

Overloading push_back() in vector to allow non-duplicate elements - c++

Can we overload the push_back() method in std::vector to allow non-duplicate elements? I know std::set and std::unordered_set are supposed to avoid duplicate elements, but std::set sorts the elements and std::unordered_set stores the elements in no particular order. I need to retrieve the elements in the order they are inserted, while ensuring duplicate elements are not inserted.
Edit: There's a possible duplicate for this question here. The best solution to this duplicate proposes to have an auxiliary data structure and another custom method "add". This doesn't look good for me since(I'll put it in a separate documentation) the users inserting data in std::vector rarely refer to the documentation for any custom functions. If there's no efficient way though, this can be a last resort.

Many people advise against it, but it seems there's some kind of urban legend going around that doing so will cause the universe to undergo vacuum decay and reality as we know it will dissolve.
You can publicly inherit from std::vector. But you have to think about what you can do with that.
If you inherit from vector, it is highly recommended that you don't add any data members to it. This can cause object slicing (google "c++ object slicing".) You also need to keep in mind that vector is not using virtual functions. That means you cannot override member functions. You can only shadow them, so it's not guaranteed that it will always be your push_back() function that gets called. The original will get called if you pass an object of your class to something that takes a reference to a vector, for example.
So in the end, you'd need to add a push_back_unique() function instead. But that in turns means that can be served by a simple free function instead. So inheriting vector isn't needed. This of course means there's never a guarantee that the elements in the vector will be unique. Other code might use push_back() instead somewhere.
Inheriting vector makes sense if you want to add completely new convenience functions that don't impose or lift any restrictions that vector has. If you want something that looks like a vector but really isn't (because it has different behavior and/or restrictions), you should implement your own type that delegates the container functionality to vector by either inheriting privately from it, or by having it as a private data member, and then replicate the vector API through public wrapper functions.
But this is very tedious to implement. Usually, you don't really need all the API from vector. So I'd say just write a smaller class around vector that only provides the functionality you need. And that functionality sounds like it's going to be pretty much read-only, since allowing write access to the elements allows for setting an element to the same value as another, breaking the container's uniqueness. So you could do something like:
template<typename T>
class UniqueVector
{
public:
void push_back(T&& elem)
{
if (std::find(vec_.begin(), vec_.end(), elem) == vec_.end()) {
vec_.push_back(std::forward(elem));
}
}
const T& operator[](size_t index) const
{
return vec_[index];
}
auto begin() const
{
return vec_.cbegin();
}
auto end() const
{
return vec_.cend();
}
private:
std::vector<T> vec_;
};
If you still want to allow write access to individual elements, then you can provide non-const functions that check if the value that is passed is already in the vector. Like:
void assign_if_unique(size_t index, T&& value)
{
if (std::find(vec_.begin(), vec_.end(), value) == vec_.end()) {
vec_[index] = std::forward(value);
}
}
This is a minimal example. You should obviously add the functions you actually want. Like size(), empty(), and whatever else you need.

You should first define a free function1 to implement your feature:
template<class T>
std::vector<T>&
push_back_unique(std::vector<T>& dest, T const& src)
{ /* ... */ }
If you use this a lot, and if make sense regarding your program, you might want to define an operator to do so:
template<class T>
std::vector<T>& operator<<(std::vector<T>& dest, T const& src)
{ return push_back_unique(dest, src); }
This allows:
std::vector<int> data;
data << 5 << 8 << 13 << 5 << 21;
for (auto n : data) std::cout << n << " "; // prints 5 8 13 21
1) This is because inheriting from standard containers is often bad practice and brings pitfalls.

Related

Why in particular should I rather pass a std::span than a std::vector& to a function?

I know this might overlap with the question What is a “span” and when should I use one?, but I think the answer to this specific part of the question is pretty confusing. On one hand, there are quotes like this:
Don't use it if you have a standard library container (or a Boost container etc.) which you know is the right fit for your code. It's not intended to supplant any of them.
But in the same answer, this statement occurs:
is the reasonable alternative to passing const vector& to functions when you expect your data to be contiguous in memory. No more getting scolded by high-and-mighty C++ gurus!
So what part am I not getting here? When would I do this:
void foo(const std::vector<int>& vec) {}
And when this?
void foo(std::span<int> sp) {}
Also, would this
void foo(const std::span<int> sp) {}
make any sense? I figured that it shouldn't, because a std::span is just a struct, containing a pointer and the length. But if it doesn't prevent you from changing the values of the std::vector you passed as an argument, how can it replace a const std::vector<T>&?

The equivalent of passing a std::vector<int> const& is not std::span<int> const, but rather std::span<int const>. The span itself being const or not won't really change anything, but more const is certainly good practice.
So when should you use it?
I would say that it entirely depends on the body of the function, which you omitted from your examples.
For example, I would still pass a vector around for this kind of functions:
std::vector<int> stored_vec;
void store(std::vector<int> vec) {
stored_vec = std::move(vec);
}
This function does store the vector, so it needs a vector. Here's another example:
void needs_vector(std::vector<int> const&);
void foo(std::vector<int> const& vec) {
needs_vector(vec);
}
As you can see, we need a vector. With a span you would have to create a new vector and therefore allocate.
For this kind of functions, I would pass a span:
auto array_sum(std::span<int const> const values) -> int {
auto total = int{0};
for (auto const v : values) {
total += v;
}
return total;
}
As you can see, this function don't need a vector.
Even if you need to mutate the values in the range, you can still use span:
void increment(std::span<int> const values) {
for (auto& v : values) {
++v;
}
}
For things like getter, I will tend to use a span too, in order to not expose direct references to members from the class:
struct Bar {
auto get_vec() const -> std::span<int const> {
return vec;
}
private:
std::vector<int> vec;
};

Regarding the difference between passing a &std::vector and passing a std::span, I can think of two important things:
std::span allows you to pass only the data you want the function to see or modify, as opposed to the whole vector (and you don't have to pass a start index and an end index). I've found this was much needed to keep code clean. After all, why would you give a function access to any more data than it needs?
std::span can take data from multiple types of containers (e.g. std::array, std::vector, C-style arrays).
This can of course be also done by passing C-style arrays - std::span is just a wrapper around C-style arrays with some added safety and convenience.

Another differentiator between the two: In order to modify size of the owning vector inside your function (via std::vector::assign or std::vector::clear, for instance), you would rather pass a std::vector& than a std::span, since span doesn't provide those features.
You can modify the contents of a std::span, but you can't change its size.

C++ class with container of pointers to internal data members: copying/assignment

Suppose I have a class Widget with a container data member d_members, and another container data member d_special_members containing pointers to distinguished elements of d_members. The special members are determined in the constructor:
#include <vector>
struct Widget
{
std::vector<int> d_members;
std::vector<int*> d_special_members;
Widget(std::vector<int> members) : d_members(members)
{
for (auto& member : d_members)
if (member % 2 == 0)
d_special_members.push_back(&member);
}
};
What is the best way to implement the copy constructor and operator=() for such a class?
The d_special_members in the copy should point to the copy of d_members.
Is it necessary to repeat the work that was done in the constructor? I hope this can be avoided.
I would probably like to use the copy-and-swap idiom.
I guess one could use indices instead of pointers, but in my actual use case d_members has a type like std::vector< std::pair<int, int> > (and d_special_members is still just std::vector<int*>, so it refers to elements of pairs), so this would not be very convenient.
Only the existing contents of d_members (as given at construction time) are modified by the class; there is never any reallocation (which would invalidate the pointers).
It should be possible to construct Widget objects with d_members of arbitrary size at runtime.
Note that the default assignment/copy just copies the pointers:
#include <iostream>
using namespace std;
int main()
{
Widget w1({ 1, 2, 3, 4, 5 });
cout << "First special member of w1: " << *w1.d_special_members[0] << "\n";
Widget w2 = w1;
*w2.d_special_members[0] = 3;
cout << "First special member of w1: " << *w1.d_special_members[0] << "\n";
}
yields
First special member of w1: 2
First special member of w1: 3

What you are asking for is an easy way to maintain associations as data is moved to new memory locations. Pointers are far from ideal for this, as you have discovered. What you should be looking for is something relative, like a pointer-to-member. That doesn't quite apply in this case, so I would go with the closest alternative I see: store indices into your sub-structures. So store an index into the vector and a flag indicating the first or second element of the pair (and so on, if your structure gets even more complex).
The other alternative I see is to traverse the data in the old object to figure out which element a given special pointer points to -- essentially computing the indices on the fly -- then find the corresponding element in the new object and take its address. (Maybe you could use a calculation to speed this up, but I'm not sure that would be portable.) If there is a lot of lookup and not much copying, this might be better for overall performance. However, I would rather maintain the code that stores indices.

The best way is to use indices. Honestly. It makes moves and copies just work; this is a very useful property because it's so easy to get silently wrong behavior with hand written copies when you add members. A private member function that converts an index into a reference/pointer does not seem very onerous.
That said, there may still be similar situations where indices aren't such a good option. If you, for example have a unordered_map instead of a vector, you could of course still store the keys rather than pointers to the values, but then you are going through an expensive hash.
If you really insist on using pointers rather that indices, I'd probably do this:
struct Widget
{
std::vector<int> d_members;
std::vector<int*> d_special_members;
Widget(std::vector<int> members) : d_members(members)
{
for (auto& member : d_members)
if (member % 2 == 0)
d_special_members.push_back(&member);
}
Widget(const Widget& other)
: d_members(other.d_members)
, d_special_members(new_special(other))
{}
Widget& operator=(const Widget& other) {
d_members = other.d_members;
d_special_members = new_special(other);
}
private:
vector<int*> new_special(const Widget& other) {
std::vector<int*> v;
v.reserve(other.d_special_members.size());
std::size_t special_index = 0;
for (std::size_t i = 0; i != d_members.size(); ++i) {
if (&other.d_members[i] == other.d_special_members[special_index]) {
v.push_back(&d_members[i});
++special_index;
}
}
return v;
}
};
My implementation runs in linear time and uses no extra space, but exploits the fact (based on your sample code) that there are no repeats in the pointers, and that the pointers are ordered the same as the original data.
I avoid copy and swap because it's not necessary to avoid code duplication and there just isn't any reason for it. It's a possible performance hit to get strong exception safety, that's all. However, writing a generic CAS that gives you strong exception safety with any correctly implemented class is trivial. Class writers should usually not use copy and swap for the assignment operator (there are, no doubt, exceptions).

This work for me for vector of pairs, though it's terribly ugly and I would never use it in real code:
std::vector<std::pair<int, int>> d_members;
std::vector<int*> d_special_members;
Widget(const Widget& other) : d_members(other.d_members) {
d_special_members.reserve(other.d_special_members.size());
for (const auto p : other.d_special_members) {
ptrdiff_t diff = (char*)p - (char*)(&other.d_members[0]);
d_special_members.push_back((int*)((char*)(&d_members[0]) + diff));
}
}
For sake of brevity I used only C-like type cast, reinterpret_cast would be better. I am not sure whether this solution does not result in undefined behavior, in fact I guess it does, but I dare to say that most compilers will generate a working program.

I think using indexes instead of pointers is just perfect. You don't need any custom copy code then.
For convenience you may want to define a member function converting the index to actual pointer you want. Then your members can be of arbitrary complexity.
private:
int* getSpecialMemberPointerFromIndex(int specialIndex)
{
return &d_member[specialIndex];
}

Is it a bad practice to return a std container from a interface class?

I have meet such a question.
I need to design a interface class, which looks like to be
struct IIDs
{
....
const std::set<int>& getAllIDs() = 0; //!< I want the collection of int to be sorted.
}
void foo()
{
const std::set<int>& ids = pIIDs->getAllIDs();
for(std::set<int>::const_iterator it = ids.begin();....;..) {
// do something
}
}
I think that return a std's container is a bit of inappropriate, for that it will force the implement to use a std::set to store the value of IDs, But If I write it as follow :
struct IIDs
{
....
int count() const = 0;
int at(int index) = 0; //!< the itmes should be sorted
}
void foo()
{
for (int i = 0; i < pIIDs->count(); ++i) {
int val = pIIDs->at(u);
...
}
}
I found that none of the std's containers could provide those requests:
the complexity of index lookup needed to less or equal than O(log n).
the complexity of insertion need to less or equal than O(log n).
the items must be sorted.
So I just have to use the example.1, Is those can be acceptable?

STL containers and template code in general should never be used across a DLL boundary.
The thing you have to keep in mind when returning complex types like STL containers is that if your call ever crosses the boundary between two different DLLs (or a DLL and an application) running different memory managers your application will most likely crash spectacularly.
The templates that make up the STL code will be executed within the implementation DLL, creating all the memory used by the container there. Later when it leaves scope in your calling code, your own memory manager will attempt to deallocate memory it doesn't own, resulting in a crash.
If you know your code won't cross DLL boundaries, and will only ever be called in the context of a single memory manager, then you're fine as far as memory management is concerned.
However, even in cases where you're only returning references, such as your example above, where the lifetime of the container would be entirely managed by the interface implementation code, unless you know that the exact same version of the STL and the exact same compiler and linker settings were used for compiling the implementation as the caller, you're asking for trouble.

The problem i see is you are returning the collection by const references, that mean that you have a member of that collection type and are returning a reference to it, if you are returning a local variable to the function (invalid memory access problems).
If it's a member variable is better provide access to begin and end iterator. If is local variable you could returned by value (C++11 should optimize and no copy anything). If it's DLL boundary try for all mean not use any C++ types, only C types.

In terms of design, and for good generic code, prefer the STL way: return iterators, leaving the container type an implementation detail of IIDs, and hide your types with typdefs
struct IIDs
{
typedef std::set<int> Container;
typedef Container::iterator IDIterator;
// We only expose iterators to the data
IDIterator begin(); //!< I want the collection of int to be sorted.
IDIterator end();
// ...
};

There are various approaches:
if you want to minimise the coupling of client code on the IIDs implementation and ensure iteration is completed while the IIDs object still exists, then use a visitor pattern: the calling code just has to supply some function to be called for each of the member elements in turn and is not responsible for the iteration itself
Visitor example:
struct IIDs
{
template <typename T>
void visit(T& t)
{
for (int i : ids_) t(i);
}
...
private:
std::set<int> ids_;
};
if you want to give the caller more freedom to mix other code in with the container traversal, and have multiple concurrent independent traversals, then provide iterators, but be aware that the client code could keep an iterator hanging around longer than the IIDs object itself - you may or may not want to handle that scenario gracefully

Store pointers to objects in multiple containers

For the sake of presenting my question, let's assume I have a set of pointers (same type)
{p1, p2, ..., pn}
I would like to store them in multiple containers as I need different access strategy to access them. Suppose I want to store them in two containers, linked list and a hash table. For linked list, I have the order and for hash table I have the fast access. Now, the problem is that if I remove a pointer from one container, I'll need to remember to remove from other container. This makes the code hard to maintain. So the question is that are there other patterns or data structures to manage situation like this? Would smart pointer help here?

If I understand correctly, you want to link your containers so that removing from one removes from all. I don't think this is directly possible. Possible solutions:
re-design whole object architecture, so pointer is not in many containers.
Use Boost Multi-index Containers Library to achieve all features you want in one container.
Use a map key instead of direct pointer to track objects, and keep the pointer itself in one map.
use std::weak_ptr so you can check if item has been deleted somewhere else, and turn it to std::shared_ptr while it is used (you need one container to have "master" std::shared_ptr to keep object around when not used)
create function/method/class to delete objects, which knows all containers, so you don't forget accidentally, when all deletion is in one place.

Why don't you create your own class which contains the both std::list and std::unordred_map and provide accessing functions and provide removal functions in a way that you can access them linearly with the list and randomly with the unordred_map, and the deletion will be deleting from both containers and insertion will insert to both. ( kind of a wrapper class :P )
Also you can consider about using std::map, and providing it a comparison function which will always keep your data structure ordered in the desired way and also you can randomly access the elements with log N access time.

As usually, try to isolate this logic to make things easier to support. Some small class with safe public interface (sorry, I didn't compile this, it is just a pseudocode).
template<class Id, Ptr>
class Store
{
public:
void add(Id id, Ptr ptr)
{
m_ptrs.insert(ptr);
m_ptrById.insert(std::make_pair(id, ptr));
}
void remove(Ptr ptr)
{
// remove in sync as well
}
private:
std::list<Ptr> m_ptrs;
std::map<Id, Ptr> m_ptrById;
};
Then use Store for keeping your pointers in sync.

If I understand your problem correctly, you are less concern with memory management (new/delete issue) and more concern with the actual "book keeping" of which element is valid or not.
So, I was thinking of wrapping each point with a "reference counter"
template< class Point >
class BookKeeping {
public:
enum { LIST_REF = 0x01,
HASH_REF = 0x02 };
BookKeeping( const Point& p ): m_p(p), m_refCout( 0x3 ) {} // assume object created in both containers
bool isValid() const { return m_refCount == 0x3; } // not "freed" from any container
void remove( unsigned int from ) { m_refCount = m_refCount & ! from ; }
private:
Point m_p;
unsigned int m_refCount;
};

See the answer (the only one, by now) to this similar question. In that case a deque is proposed instead of a list, since the OP only wanted to insert/remove at the ends of the sequence.
Anyway, you might prefer to use the Boost Multi-index Containers Library.

Immutable container with mutable content

The story begins with something I thought pretty simple :
I need to design a class that will use some STL containers. I need to give users of the class access to an immutable version of those containers. I do not want users to be able to change the container (they can not push_back() on a list for instance), but I want users to be able to change the contained objects (get an element with back() and modify it) :
class Foo
{
public:
// [...]
ImmutableListWithMutableElementsType getImmutableListWithMutableElements();
// [...]
};
// [...]
myList = foo.getImmutableListWithMutableElements();
myElement = myList.back();
myElement.change(42); // OK
// [...]
// myList.push_back(myOtherElement); // Not possible
At first glance, it seems that a const container will do. But of course, you can only use a const iterator on a const container and you can not change the content.
At second glance, things like specialized container or iterator come to mind. I will probably end up with that.
Then, my thought is "Someone must have done that already !" or "An elegant, generic solution must exist !" and I'm here asking my first question on SO :
How do you design / transform a standard container into an immutable container with mutable content ?
I'm working on it but I feel like someone will just say "Hey, I do that every time, it's easy, look !", so I ask...
Thank you for any hints, suggestions or wonderful generic ways to do that :)
EDIT:
After some experiments, I ended up with standard containers that handle some specifically decorated smart pointers. It is close to Nikolai answer.
The idea of an immutable container of mutable elements is not a killing concept, see the interesting notes in Oli answer.
The idea of a specific iterator is right of course, but it seems not practical as I need to adapt to any sort of container.
Thanks to you all for your help.

The simplest option would probably be a standard STL container of pointers, since const-ness is not propagated to the actual objects. One problem with this is that STL does not clean up any heap memory that you allocated. For that take a look at Boost Pointer Container Library or smart pointers.

Rather than providing the user with the entire container, could you just provide them non-const iterators to beginning and end? That's the STL way.

You need a custom data structure iterator, a wrapper around your private list.
template<typename T>
class inmutable_list_it {
public:
inmutable_list_it(std::list<T>* real_list) : real_list_(real_list) {}
T first() { return *(real_list_->begin()); }
// Reset Iteration
void reset() { it_ = real_list_->begin(); }
// Returns current item
T current() { return *it_; }
// Returns true if the iterator has a next element.
bool hasNext();
private:
std::list<T>* real_list_;
std::list<T>::iterator it_;
};

The painful solution:
/* YOU HAVE NOT SEEN THIS */
struct mutable_int {
mutable_int(int v = 0) : v(v) { }
operator int(void) const { return v; }
mutable_int const &operator=(int nv) const { v = nv; return *this; }
mutable int v;
};
Excuse me while I have to punish myself to atone for my sins.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Overloading push_back() in vector to allow non-duplicate elements - c++

Related

Why in particular should I rather pass a std::span than a std::vector& to a function?

C++ class with container of pointers to internal data members: copying/assignment

Is it a bad practice to return a std container from a interface class?

Store pointers to objects in multiple containers

Immutable container with mutable content

Categories

Resources