Storing pointers inside C++ standard library containers - c++

I am implementing a project, in an enviroment where I am required to create hundreds of millions of std::string objects. I am storing these strings in multiple containers, therefore the count (copying) of these objects are multiplied, and is a huge bottleneck for my program.
I am trying to come up with a solution, and my online research has taken me so far. Basically my idea is, given that the strings I construct are constant and unnecessarily copied, I would like to instead allocate my own c-type strings (char arrays), and share these pointers across containers (I need these containers because of different advantages of look-up, insertion, etc...).
The main containers that I use are std::vector, std::map, std::unordered_set. For the last two, I am looking for ways to make these containers compatible with the char* type. I employed the help of stackoverflow, created a custom hash function for std::unordered_set and a char* comparison (less than, "<") for like std::map<_,_,less_than>.
To make my questions clear, I am going to list them.
Before going into technical issues, is this achievable, or standard usage, or worth striving for ?
Comparison function works, insertion is successful. However, given that two pointer can point to the same string, but act like different keys inside an std::ordered_set and std::map, will I also need something like an equality operator overload to be able to use methods like contains or erase. For example if const char* p1 = "beta" and const char* p2 = "beta", std::map::erase(p1) should be able to delete the entry p2 if present inside std::map (suppose two "beta" 's are inside different memory positions.`).
If I made myself clear, is there a better way ?
Thanks for your time.

Related

How to use C++ std::sets as building blocks of a class?

I need a data structure that satisfies the following:
stores an arbitrary number of elements, where each element is described by 10 numeric metrics
allows fast (log n) search of elements by any of the metrics
allows fast (log n) insertion of new elements
allows fast (log n) removal of elements
And let's assume that the elements are expensive to construct.
I came up with the following plan
store all elements in a vector called DATA.
use 10 std::sets, one for each of 10 metrics. Each std:set is light-weight, it contains only integers, which are indexes into the vector DATA. The comparison operators 'look up' the appropriate element in DATA and then select the appropriate metric
template&lt int C &gt
struct Cmp
{
bool operator() (int const a, int const b)
{
return ( DATA[a].coords[C] != DATA[b].coords[C] )
? ( DATA[a].coords[C] &lt DATA[b].coords[C] )
: ( a &lt b );
}
};
Elements are never modified or removed from a vector. A new element is pushed back to DATA and then its index (DATA.size()-1) is inserted into the sets (set<int, Cmp<..> >). To remove an element, I set a flag in the element saying that it is deleted (without actually removing it from the DATA vector) and then remove the element index from all ten std::sets.
This works fine as long as DATA is a global variable. (It also somewhat abuses the type system by making the templated struct Cmp dependent on a global variable.)
However, I was not able to enclose the DATA vector and std::set's (set<int, Cmp<...> >) inside a class and then 'index' DATA with those std::sets. For starters, the comparison operator Cmp defined inside an outer class has no access to the outer class' fields (so it cannot assess DATA). I also cannot pass the vector to the Cmp constructor because Cmp is being constructed by std::set and std::set expects a comparison operator with a constructor that has no arguments.
I have a feeling I'm working against C++ type system and trying to achieve something that the type system is purposely preventing me from doing. (I'm trying to make std::set depend on a variable that is going to be constructed only at runtime.) And while I understand why the type system might not like what I do, I think this is a legitimate use case.
Is there a way to implement the data structure/class I described above without providing a re-implementation of std::set/red-black tree? I hope there may be a trick I have not thought of yet. (And yes, I know that boost has something, but I'd like to stick to the standard library.)
When I read something like "look up foo by a value bar", my initial reaction is to use a map<> or something similar. There are some implications to this though:
Keys in an std::map (or values in an std::set) are unique, so no two elements can share the same key and accordingly no two data objects would be able to have the same metric. If multiple data objects can have the same metric (this isn't clear from your question), using an std::multimap (or std::multiset) would work though.
If the keys are constant and stored within the elements themselves, using a set<data*,cmp> is a common approach. The comparator then just retrieves the according field from the objects and compares them. Lookup then requires creating a temporary object and using find() with it. Some implementations also have an extension that allows searching with a different type, which would make this much easier but also make porting require actual work.
It is important that the fields used as keys remain constant though, because if you modify them, you implicitly change the order of the set<>. This is the reason that a set<>'s elements are effectively constant, i.e. even a plain iterator has a constant as value type. If you store pointers though, you can easily get around that, because a constant pointer is something different than a pointer to a constant. Don't shoot yourself into the foot with that!
If the metrics are not so much a property of the objects themselves (or you don't mind redundantly storing them), using an std::map would be a natural choice. Storing the same object under multiple keys, depending on the metric, can be done in separate containers (map<int,data*> c[10];). However, you can do that in a single map using e.g. a pair<metric,value> as key (map<pair<int,int>,data*> c;).
Using a vector<> to store the actual elements and only referencing them as either pointers or indices in a map surely works. I'd take the pointers though, as this is what allows the above approaches using a set or map to work. Without that, the comparator would have to store a reference to the container, where at the moment it just uses the global DATA container. Getting this to work with a vector is tricky though, since it reallocates its elements when growing, as you correctly pointed out. I'd consider a different container type, like std::list or std::deque. The former would allow erasing elements, too, but it has a higher per-element overhead. The latter has a relatively low per-element overhead, only slightly above std::vector. You could then even go so far as to store iterators instead of pointers, which helps debugging provided you use a "checked STL" for that. Still, you will have to do some manual bookkeeping which object is still referenced somewhere and which one isn't.
Instead of using a separate container, you could also allocate the elements dynamically, although that itself has some overhead. If the overhead per element is not an issue, you could then use reference-counted smart pointers. If the application is a one-shot process, you could also use raw pointers and let the OS reclaim the memory on exit.
Note that I assume that storing multiple copies of the data objects is not an option. If that was the case, you could just as well have a map<int,data> m[10];, where each map stores its own copy of the data objects. All the bookkeeping issues would then be resolved, but at the price of a 10x overhead.

STL heap containing pointers to objects

I have an std::list<MyObject*> objectList container that I need to sort and maintain in the following scenario:
Each object has a certain field that supplies a cost (a float value for example). That cost value is used to compare two objects as if they were floating point numbers
The collection must be ordered (ascending) and must quickly find the correct position for a newly inserted element.
It is possible to delete the lowest element (in terms of cost) and it is also possible to update the cost of several arbitrarily positioned elements. The list must be then reordered as fast as possible, taking advantage of its already sorted nature.
Could I use any other stl container/mechanism to allow for the three behavioral properties? It pretty much resembles a heap and I thought using make_heap could be a good way to sort the list. I need to have a container of pointers, since there are several other data structures that rely on these pointers.
How then can I choose a better container that's also pointer friendly and allows sorting by looking at the comparison operators of the pointed types?
CLARIFICATION: I need an stl container that best fits the scenario and can successfully wrap pointers or references for that matter. (For example, I read briefly that the std::set container could be a good candidate, but I have no experience with it).
A current implementation, based on the below answers:
struct SHafleEdgeComparatorFunctor
{
bool operator()(SHEEdge* lhs, SHEEdge* rhs)
{
return (*lhs) < rhs;
}
};
std::multiset<SHEEdge*, SHafleEdgeComparatorFunctor> m_edges;
Of course, the SHEEdge data structure has an overloaded operator:
bool operator<(SHEEdge* rhs)
{
return this->GetCollapseError() < rhs->GetCollapseError();
}
I would indeed use std::set. The tricky bit in your requirements is to update existing elements.
A std::set is always sorted. You will have to either wrap your pointers in a class with a useful compare operator or you have to pass a comparison predicate to the set.
Then you get the sorted property automatically and you get constant time removal of the lowest element.
You also get updating of the cost value in log complexity: Simply remove the object from the set and re-add it. This will be as fast as it can be for a sorted container.
Inserting, and deleting is fast in a set.
I'd start using a smart pointer like shared_ptr instead of a raw pointer (raw pointers are good e.g. if they are observing pointers, like pointers passed as function parameters, but when you have ownership semantics, like in this case, it's better to use a smart pointer).
Then, I'd start with std::vector as a container.
So, try make it vector<shared_ptr<MyObject>>.
You can measure performance of it compared to list<shared_ptr<MyObject>>.
(Note also that std::list has kind of more overhead than std::vector, since it's a node-based container, and each node has some overhead; instead std::vector allocates a contiguous chunk of memory to store its data, in this case the shared_ptrs; so std::vector is also more "cache-friendly", etc.)
In general, std::vector offers very good performance, and it's a good option as a "first choice" container. In any case, your mileage may very, and the best thing is to measure performance (speed) to get a better understanding in your particular case.
if I understand correctly what you are asking, you are looking for the correct container to use.
Indeed std::set seems to be the correct container for the kind of things what you want to do, but it will depend on all the use cases
Do you need to have O(1) access to the elements?
What is the operation used that will have the most important cost?
std::set uses a key to sort the elements and doesn't allow having duplicates (if you want duplicates, have a look at std::multiset). When you add an element, it will automatically be inserted in the correct position. Generally you don't want to use raw pointers as the key, as objects can be null.
Another alternative could be to use a std::vector<std::shared_ptr>>, as #MikePro said, it is a good practice to have the pointers inside smart pointers, to prevent having to manually delete them (and avoid any memory leak in case of an exception for example). If you use a vector, you will have to use functions like std::sort, std::find present in <algorithm> header or std::vector::insert.
Generally this image helps finding your container. It's not perfect (as you have to know a bit more than what is displayed) but it usually does its job well:

Sharing an array with STL vectors

I would like to share the contents of an array of doubles a of size k with one or more STL vectors v1, v2...vn.
The effect that I want from this shared storage is that if the underlying array gets modified the change can be observed from all the vectors that share its contents with the array.
I can do that by defining the vectors v1...vn as vectors of pointers
vector<double*> v1;
and copy the pointers a to a + k into this vector. However, I do not like that solution. I want the vectors to be a vector of doubles.
Given that you can extract the underlying pointer from a vector I am assuming one could initialize a vector with an array in such a way that the contents are shared. Would appreciate help about how to do this.
Given that you can extract the underlying pointer from a vector I am assuming one could initialize a vector with an array in such a way that the contents are shared.
No, you can't do this. The Standard Library containers always manage their own memory.
Your best option is to create the std::vector<double> and then use it as an array where you need to do so (via &v[0], assuming the vector is non-empty).
If you just want to have the container interface, consider using std::array (or boost::array or std::tr1::array) or writing your own container interface to encapsulate the array.
This sounds to me like you want to alias the array with a vector. So logically you want a vector of references (which doesn't work for syntactical reasons). If you really really need this feature, you can write your own ref wrapper class, that behaves exactly like an actual C++ reference, so the users of your vn vectors wont be able to distinguish between vector<T> and vector<ref<T> > (e.g. with T = double). But internally, you could link the items in the vectors to the items in your "master" array.
But you should have darned good reasons to do this overhead circus :)
OK, Standard Library containers are both holders of information, and enumerators for those elements. That is, roughly any container can be used in almost any algorithm, and at least, you can go through them using begin() and end().
When you separate both (element holding and element enumeration), as in your case, you may consider boost.range. boost.range gives you a pair of iterators that delimit the extent to which algorithms will be applied, and you have the actual memory store in your array. This works mostly to read-access them, because normally, modifying the structure of the vector will invalidate the iterators. You can recreate them, though.
To answer your question, as far as I know std::vector can not be given an already constructed array to use. I can not even think how that could be done since there are also the size/capacity related variables. You can possibly try to hack a way to do it using a custom allocator but I feel it will be ugly, error prone and not intuitive for future maintenance.
That said, if I may rephrase your words a bit, you are asking for multiple references to the same std::vector. I would either do just that or maybe consider using a shared_ptr to a vector.

What container to choose

I thought about storing some objects ... and now I don't know what to choose.
So, now I have such code:
std::map<std::string, Object*> mObjects;
But, as I was told here before, it's slow due to allocation of std::string in each searching, so the key should be integer.
Why did I chose std::string as key? Because it's very easy to access objects by their name, for example:
mObjects["SomeObj"];
So my first idea is:
std::map<int, Object*> mObjects;
and key is an CRC of object name:
mObjects[CRC32("SomeObject")];
But it's a bit unstable. And I know there is special hash-maps for this.
And the last, I have to sort my objects in map using some Compare function.
Any ideas about container I can use?
So again, the main points:
Accesing objects by string, but keyshould be integer, not string
Sorting objects in map by some function
p.s. boost usage is permissible.
I can't say for sure, but are you always accessing items in the map by a literal string? If so, then you should just use consecutive enumerated values with symbolic names, and an appropriately sized vector.
Assuming that you won't know the names until runtime 1000 items in the map seems really small for searching to possibly be a bottleneck. Are you sure that the lookup is the performance problem? Have you profiled to make sure that is the case? In general, using the most intuitive container is going to result in better code (because you can grasp the algorithm more easily) code.
Does your comment about constructing strings imply that you passing C-strings into the find function over and over? Try to avoid that by using std::string consistently in your application.
If you insist on using the two-part approach: I suggest storing all your items in a vector. Then you have one unordered_map from string to index and another vector that has all the indexes into the main container. Then you sort this second container of indexes to get the ordering you need. Finally, when you delete items from the master container you'll need to clean up both of the other two referencing containers.

What is the C++ equivalent of C# Collection<T> and how do you use it?

I have the need to store a list/collection/array of dynamically created objects of a certain base type in C++ (and I'm new to C++). In C# I'd use a generic collection, what do I use in C++?
I know I can use an array:
SomeBase* _anArrayOfBase = new SomeBase[max];
But I don't get anything 'for free' with this - in other words, I can't iterate over it, it doesn't expand automatically and so on.
So what other options are there?
Thanks
There is std::vector which is a wrapper around an array, but it can expand and will do automatically. However, it is a very expensive operation, so if you are going to do a lot of insertion or removal operations, don't use a vector. (You can use the reserve function, to reserve a certain amount of space)
std::list is a linked list, which has far faster insertion and removal times, but iteration is slower as the values are not stored in contiguous memory, which means that address calculation is far more complex and you can't take advantage of the processors cache when iterating over the list.
The major upside compared to the vector or deque is that elements can be added or removed from anywhere in the list fairly cheaply.
As a compromise, there is std::deque, which externally works in a similar way to a vector, but internally they are very different. The deque's storage doesn't have to be contiguous, so it can be divided up into blocks, meaning that when the deque grows, it doesn't have to reallocate the storage space for its entire contents. Access is slightly slower and you can't do pointer arithmetic to get an element.
You should use a vector.
#include <vector>
int main()
{
std::vector<SomeBase*> baseVector;
baseVector.push_back(new SomeBase());
}
C++ contains a collection of data containers within the STL.
Check it out here.
You should use one of the containers
std::vector<SomeBase>
std::list<SomeBase>
and if you really need dynamically allocated objects
std::vector<boost::shared_ptr<SomeBase>>
std::list<boost::shared_ptr<SomeBase>>
Everyone has mentioned that the common SC++L controls, but there is another important caveat when doing this in C++ (that Chaoz has included in his example).
In C++, your collection will need to be templated on SomeBase*, not on SomeBase. If you try to assign an instance of the derived type to an instance of the base typem you will end up causing what is called object slicing. This is almost definately not what you are trying to do.
Since you are coming from C#, just remember that "SomeBase MyInstance" means something very different in both languages. The C++ equivalent to this is usually "SomeBase* MyPointer" or "SomeBase& MyReference".
Use a vector. Have a look here.
I am a huge fan of std::deque. If you want things for free, the deque gives them to you. Fast access from the head and tail of the list. iterators, reverse_iterators, fast insertion at the head and the tail. It's not super specialized, but you wanted free stuff. ;-)
Also, I will link a great STL reference. The STL is where you get all the standard "free" stuff in C++. Standard Template Library. Enjoy!
Use STL. std::vector and std::set for instance. Plenty of examples out there.