Order a container by member with STL - c++

Suppose I have some data stored in a container of unique_ptrs:
struct MyData {
int id; // a unique id for this particular instance
data some_data; // arbitrary additional data
};
// ...
std::vector<std::unique_ptr<MyData>> my_data_vec;
The ordering of my_data_vec is important. Suppose now I have another vector of IDs of MyDatas:
std::vector<int> my_data_ids;
I now want to rearrange my_data_vec such that the elements are in the sequence specified by my_data_ids. (Don't forget moving a unique_ptr requires move-semantics with std::move().)
What's the most algorithmically efficient way to achieve this, and do any of the STL algorithms lend themselves well to achieving this? I can't see that std::sort would be any help.
Edit: I can use O(n) memory space (not too worried about memory), but the IDs are arbitrary (in my specific case they are actually randomly generated).

Create a map that maps ids to their index in my_data_ids.
Create a function object that compares std::unique_ptr<MyData> based on their ID's index in that map.
Use std::sort to sort the my_data_vec using that function object.
Here's a sketch of this:
// Beware, brain-compiled code ahead!
typedef std::vector<int> my_data_ids_type;
typedef std::map<int,my_data_ids_type::size_type> my_data_ids_map_type;
class my_id_comparator : public std::binary_function< bool
, std::unique_ptr<MyData>
, std::unique_ptr<MyData> > {
public:
my_id_comparator(const my_data_ids_map_type& my_data_ids_map)
: my_data_ids_map_(my_data_ids_map) {}
bool operator()( const std::unique_ptr<MyData>& lhs
, const std::unique_ptr<MyData>& rhs ) const
{
my_data_ids_map_type::const_iterator it_lhs = my_data_ids_map_.find(lhs.id);
my_data_ids_map_type::const_iterator it_rhs = my_data_ids_map_.find(rhs.id);
if( it_lhs == my_data_ids_map_.end() || it_rhs == my_data_ids_map_.end() )
throw "dammit!"; // whatever
return it_lhs->second < it_rhs->second;
}
private
my_data_ids_map_type& my_data_ids_map_;
};
//...
my_data_ids_map_type my_data_ids_map;
// ...
// populate my_data_ids_map with the IDs and their indexes from my_data_ids
// ...
std::sort( my_data_vec.begin(), my_data_vec.end(), my_id_comparator(my_data_ids_map) );
If memory is scarce, but time doesn't matter, you could do away with the map and search the IDs in the my_data_ids vector for each comparison. However, you would have to be really desperate for memory to do that, since two linearly complex operations per comparison are going to be quite expensive.

Why don't you try moving the data into a STL Set ? you need only to implement the comparison function, and you will end up with a perfectly ordered set of data very fast.

Why don't you just use a map<int, unique_ptr<MyData>> (or multimap)?

Related

Iterating over Unorderd_map using indexed for loop

I am trying to access values stored in an unorderd_map using a for loop, but I am stuck trying to access values using the current index of my loop. Any suggestion, or link to look-on? thanks. [Hint: I don't want to use an iterator].
my sample code:
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
using namespace std;
int main()
{
unordered_map<int,string>hash_table;
//filling my hash table
hash_table.insert(make_pair(1,"one"));
hash_table.insert(make_pair(2,"two"));
hash_table.insert(make_pair(3,"three"));
hash_table.insert(make_pair(4,"four"));
//now, i want to access values of my hash_table with for loop, `i` as index.
//
for (int i=0;i<hash_table.size();i++ )
{
cout<<"Value at index "<<i<<" is "<<hash_table[i].second;//I want to do something like this. I don't want to use iterator!
}
return 0;
}
There are two ways to access an element from an std::unordered_map.
An iterator.
Subscript operator, using the key.
I am stuck trying to access values using the current index of my loop
As you can see, accessing an element using the index is not listed in the possible ways to access an element.
I'm sure you realize that since the map is unordered the phrase element at index i is quite meaningless in terms of ordering. It is possible to access the ith element using the begin iterator and std::advance but...
Hint: I don't want to use an iterator].
Hint: You just ran out of options. What you want to do is not possible. Solution: Start wanting to use tools that are appropriate to achieving your objective.
If you want to iterate a std::unordered_map, then you use iterators because that's what they're for. If you don't want to use iterators, then you cannot iterate an std::unordered_map. You can hide the use of iterators with a range based for loop, but they're still used behind the scenes.
If you want to iterate something using a position - index, then what you need is an array such as a std::vector.
First, why would you want to use an index versus an iterator?
Suppose you have a list of widgets you want your UI to draw. Each widget can have its own list of child widgets, stored in a map. Your options are:
Make each widget draw itself. Not ideal since widgets are now coupled to the UI kit you are using.
Return the map and use an iterator in the drawing code. Not ideal because now the drawing code knows your storage mechanism.
An API that can avoid both of these might look like this.
const Widget* Widget::EnumerateChildren(size_t* io_index) const;
You can make this work with maps but it isn't efficient. You also can't guarantee the stability of the map between calls. So this isn't recommended but it is possible.
const Widget* Widget::EnumerateChildren(size_t* io_index) const
{
auto& it = m_children.begin();
std::advance(it, *io_index);
*io_index += 1;
return it->second;
}
You don't have to use std::advance and could use a for loop to advance the iterator yourself. Not efficient or very safe.
A better solution to the scenario I described would be to copy out the values into a vector.
void Widget::GetChildren(std::vector<Widget*>* o_children) const;
You can't do it without an iterator. An unordered map could store the contents in any order and move them around as it likes. The concept of "3rd element" for example means nothing.
If you had a list of the keys from the map then you could index into that list of keys and get what you want. However unless you already have it you would need to iterate over the map to generate the list of keys so you still need an iterator.
An old question.
OK, I'm taking the risk: here may be a workaround (not perfect though: it is just a workaround).
This post is a bit long because I explain why this may be needed. In fact one might want to use the very same code to load and save data from and to a file. This is very useful to synchronize the loading and saving of data, to keep the same data, the same order, and the same types.
For example, if op is load_op or save_op:
load_save_data( var1, op );
load_save_data( var2, op );
load_save_data( var3, op );
...
load_save_data hides the things performed inside. Maintenance is thus much more easy.
The problem is when it comes to containers. For example (back to the question) it may do this for sets (source A) to save data:
int thesize = theset.size();
load_save(thesize, load); // template (member) function with 1st arg being a typename
for( elem: theset) {
load_save_data( thesize, save_op );
}
However, to read (source B):
int thesize;
load_save_data( thesize, save);
for( int i=0; i<thesize, i++) {
Elem elem;
load_save_data( elem, load_op);
theset.insert(elem);
}
So, the whole source code would be something like this, with too loops:
if(op == load_op) { A } else { B }
The problem is there are two different kinds of loop, and it would be nice to merge them as one only. Ideally, it would be nice to be able to do:
int thesize;
load_save_data( thesize, save);
for( int i=0; i<thesize, i++) {
Elem elem;
if( op == save_op ) {
elem=theset[i]; // not possible
}
load_save_data( elem, op);
if( op == load_op ) {
theset.insert(elem);
}
}
(as this code is used in different contexts, care may be taken to provide enough information to the compiler to allow it the strip the unnecessary code (the right "if"), not obvious but possible)
This way, each call to load_save_data is in the same order, the same type. You forget a field for both or none, but everything is kept synchronized between save and load. You may add a variable, change a type, change the order etc in one place only. The code maintenance is thus easier.
A solution to the impossible "theset[i]" is indeed to use a vector or a map instead of a set but you're losing the properties of a set (avoid two identical items).
So a workaround (but it has a heavy price: efficiency and simplicity) is something like:
void ...::load_save( op )
{
...
int thesize;
set<...> tmp;
load_save_data( thesize, save);
for( int i=0; i<thesize, i++) {
Elem elem;
if( op == save_op ) {
elem=*(theset.begin()); \
theset.erase(elem); > <-----
tmp.insert(elem); /
}
load_save_data( elem, op);
if( op == load_op ) {
theset.insert(elem);
}
}
if(op == save_op) {
theset.insert(tmp.begin(), tmp.end()); <-----
}
...
}
Not very beautiful but it does the trick, and (IMHO) itis the closest answer to the question.

Graph based on unordered_map performance (short version)

Hello :) I am implementing some graph where vertices are strings. I do many things with them, so using strings would be highly ineffective. That is why I am using indexes, simple ints. But although the rest of the class works pretty fast, I have trouble with the part I copied below. I've read somewhere that unordered_map needs some hash function, should I add it? If yes, how? The code below contains EVERYTHING that I am doing with the unordered_map.
Thank you in advance for help :)
class Graph
{
private:
unordered_map <string, int> indexes_of_vertices;
int number_of_vertices;
int index_counter;
int get_index(string vertex)
{
if (indexes_of_vertices.count(vertex) == 0) // they key is missing yet
{
indexes_of_vertices[vertex] = index_counter;
return index_counter++;
}
else
return indexes_of_vertices[vertex];
}
public:
Graph(int number_of_vertices)
{
this->number_of_vertices = number_of_vertices;
index_counter = 0;
}
};
Here's a quick optimization for what seems to be the important function:
int get_index(const string& vertex)
{
typedef unordered_map <string, int> map_t;
pair<map_t::iterator, bool> inserted =
indexes_of_vertices.insert(map_t::value_type(vertex, index_counter));
if (inserted.second) // the key was missing until now
return index_counter++;
else // inserted.second is false, means vertex was already there
return inserted.first->second; // this is the value
}
The optimizations are:
Take argument by const-ref.
Do a single map lookup instead of two: we speculatively insert() then see if it worked or not, which saves a redundant lookup in either case.
Please let us know how much difference that makes. Another idea, if your keys are usually small, is to use a self-contained string type like GCC's vstring which avoids ex-situ memory allocation for strings under one or two dozen characters. And then to consider whether your data are really large enough to benefit from a hash table, or if another data structure would be more efficient.

fastest way to convert a std::vector to another std::vector

What is the fastest way (if there is any other) to convert a std::vector from one datatype to another (with the idea to save space)? For example:
std::vector<unsigned short> ----> std::vector<bool>
we obviously assume that the first vector only contains 0s and 1s. Copying element by element is highly inefficient in case of a really large vector.
Conditional question:
If you think there is no way to do it faster, is there a complex datatype which actually allows fast conversion from one datatype to another?
std::vector<bool>
Stop.
A std::vector<bool> is... not. std::vector has a specialization for the use of the type bool, which causes certain changes in the vector. Namely, it stops acting like a std::vector.
There are certain things that the standard guarantees you can do with a std::vector. And vector<bool> violates those guarantees. So you should be very careful about using them.
Anyway, I'm going to pretend you said vector<int> instead of vector<bool>, as the latter really complicates things.
Copying element by element is highly inefficient in case of a really large vector.
Only if you do it wrong.
Vector casting of the type you want needs to be done carefully to be efficient.
If the the source T type is convertible to the destination T, then this is works just fine:
vector<Tnew> vec_new(vec_old.begin(), vec_old.end());
Decent implementations should recognize when they've been given random-access iterators and optimize the memory allocation and loop appropriately.
The biggest problem for non-convertible types you'll have for simple types is not doing this:
std::vector<int> newVec(oldVec.size());
That's bad. That will allocate a buffer of the proper size, but it will also fill it with data. Namely, default-constructed ints (int()).
Instead, you should do this:
std::vector<int> newVec;
newVec.reserve(oldVec.size());
This reserves capacity equal to the original vector, but it also ensures that no default construction takes place. You can now push_back to your hearts content, knowing that you will never cause reallocation in your new vector.
From there, you can just loop over each entry in the old vector, doing the conversion as needed.
There's no way to avoid the copy, since a std::vector<T> is a distinct
type from std::vector<U>, and there's no way for them to share the
memory. Other than that, it depends on how the data is mapped. If the
mapping corresponds to an implicit conversion (e.g. unsigned short to
bool), then simply creating a new vector using the begin and end
iterators from the old will do the trick:
std::vector<bool> newV( oldV.begin(), oldV.end() );
If the mapping isn't just an implicit conversion (and this includes
cases where you want to verify things; e.g. that the unsigned short
does contain only 0 or 1), then it gets more complicated. The
obvious solution would be to use std::transform:
std::vector<TargetType> newV;
newV.reserve( oldV.size() ); // avoids unnecessary reallocations
std::transform( oldV.begin(), oldV.end(),
std::back_inserter( newV ),
TranformationObject() );
, where TranformationObject is a functional object which does the
transformation, e.g.:
struct ToBool : public std::unary_function<unsigned short, bool>
{
bool operator()( unsigned short original ) const
{
if ( original != 0 && original != 1 )
throw Something();
return original != 0;
}
};
(Note that I'm just using this transformation function as an example.
If the only thing which distinguishes the transformation function from
an implicit conversion is the verification, it might be faster to verify
all of the values in oldV first, using std::for_each, and then use
the two iterator constructor above.)
Depending on the cost of default constructing the target type, it may be
faster to create the new vector with the correct size, then overwrite
it:
std::vector<TargetType> newV( oldV.size() );
std::transform( oldV.begin(), oldV.end(),
newV.begin(),
TranformationObject() );
Finally, another possibility would be to use a
boost::transform_iterator. Something like:
std::vector<TargetType> newV(
boost::make_transform_iterator( oldV.begin(), TranformationObject() ),
boost::make_transform_iterator( oldV.end(), TranformationObject() ) );
In many ways, this is the solution I prefer; depending on how
boost::transform_iterator has been implemented, it could also be the
fastest.
You should be able to use assign like this:
vector<unsigned short> v;
//...
vector<bool> u;
//...
u.assign(v.begin(), v.end());
class A{... }
class B{....}
B convert_A_to_B(const A& a){.......}
void convertVector_A_to_B(const vector<A>& va, vector<B>& vb)
{
vb.clear();
vb.reserve(va.size());
std::transform(va.begin(), va.end(), std::back_inserter(vb), convert_A_to_B);
}
The fastest way to do it is to not do it. For example, if you know in advance that your items only need a byte for storage, use a byte-size vector to begin with. You'll find it difficult to find a faster way than that :-)
If that's not possible, then just absorb the cost of the conversion. Even if it's a little slow (and that's by no means certain, see Nicol's excellent answer for details), it's still necessary. If it wasn't, you would just leave it in the larger-type vector.
First, a warning: Don't do what I'm about to suggest. It's dangerous and must never be done. That said, if you just have to squeeze out a tiny bit more performance No Matter What...
First, there are some caveats. If you don't meet these, you can't do this:
The vector must contain plain-old-data. If your type has pointers, or uses a destructor, or needs an operator = to copy correctly ... do not do this.
The sizeof() both vector's contained types must be the same. That is, vector< A > can copy from vector< B > only if sizeof(A) == sizeof(B).
Here is a fairly stable method:
vector< A > a;
vector< B > b;
a.resize( b.size() );
assert( sizeof(vector< A >::value_type) == sizeof(vector< B >::value_type) );
if( b.size() == 0 )
a.clear();
else
memcpy( &(*a.begin()), &(*b.begin()), b.size() * sizeof(B) );
This does a very fast, block copy of the memory contained in vector b, directly smashing whatever data you have in vector a. It doesn't call constructors, it doesn't do any safety checking, and it's much faster than any of the other methods given here. An optimizing compiler should be able to match the speed of this in theory, but unless you're using an unusually good one, it won't (I checked with Visual C++ a few years ago, and it wasn't even close).
Also, given these constraints, you could forcibly (via void *) cast one vector type to the other and swap them -- I had a code sample for that, but it started oozing ectoplasm on my screen, so I deleted it.
Copying element by element is not highly inefficient. std::vector provides constant access time to any of its elements, hence the operation will be O(n) overall. You will not notice it.
#ifdef VECTOR_H_TYPE1
#ifdef VECTOR_H_TYPE2
#ifdef VECTOR_H_CLASS
/* Other methods can be added as needed, provided they likewise carry out the same operations on both */
#include <vector>
using namespace std;
class VECTOR_H_CLASS {
public:
vector<VECTOR_H_TYPE1> *firstVec;
vector<VECTOR_H_TYPE2> *secondVec;
VECTOR_H_CLASS(vector<VECTOR_H_TYPE1> &v1, vector<VECTOR_H_TYPE2> &v2) { firstVec = &v1; secondVec = &v2; }
~VECTOR_H_CLASS() {}
void init() { // Use this to copy a full vector into an empty (or garbage) vector to equalize them
secondVec->clear();
for(vector<VECTOR_H_TYPE1>::iterator it = firstVec->begin(); it != firstVec->end(); it++) secondVec->push_back((VECTOR_H_TYPE2)*it);
}
void push_back(void *value) {
firstVec->push_back((VECTOR_H_TYPE1)value);
secondVec->push_back((VECTOR_H_TYPE2)value);
}
void pop_back() {
firstVec->pop_back();
secondVec->pop_back();
}
void clear() {
firstVec->clear();
secondVec->clear();
}
};
#undef VECTOR_H_CLASS
#endif
#undef VECTOR_H_TYPE2
#endif
#undef VECTOR_H_TYPE1
#endif

Container with two indexes (or a compound index)

I have a class like this
class MyClass
{
int Identifier;
int Context;
int Data;
}
and I plan to store it in a STL container like
vector<MyClass> myVector;
but I will need to access it either by the extenal Index (using myVector[index]); and the combination of Identifier and Context which in this case I would perform a search with something like
vector<MyClass>::iterator myIt;
for( myIt = myVector.begin(); myIt != myVector.end(); myIt++ )
{
if( ( myIt->Idenfifier == target_id ) &&
( myIt->Context == target_context ) )
return *myIt; //or do something else...
}
Is there a better way to store or index the data?
Boost::Multi-Index has this exact functionality if you can afford the boost dependency (header only). You would use a random_access index for the array-like index, and either hashed_unique, hashed_non_unique, ordered_unique, or ordered_non_unique (depending on your desired traits) with a functor that compares Identifier and Context together.
We need to know your usage. Why do you need to be able to get them by index, and how often do you need to search the container for a specific element.
If you store it in an std::set, your search time with be O(ln n), but you cannot reference them by index.
If you use an std::vector, you can index them, but you have to use std::find to get a specific element, which will be O(n).
But if you need an index to pass it around to other things, you could use a pointer. That is, use a set for faster look-up, and pass pointers (not index's) to specific elements.
Yes, but if you want speed, you'll need to sacrifice space. Store it in a collection (like an STL set) with the identifier/context as key, and simultaneously store it in a vector. Of course, you don't want two copies of the data itself, so store it in the set using a smart pointer (auto_ptr or variant) and store it in the vector using a dumb pointer.

Safe To Modify std::pair<U, V>::first in vector of pairs?

I'm currently working on a DNA database class and I currently associate each row in the database with both a match score (based on edit distance) and the actual DNA sequence itself, is it safe to modify first this way within an iteration loop?
typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT> DnaDatabaseT;
// ....
for(DnaDatabaseT::iterator it = database.begin();
it != database.end(); it++)
{
int score = it->second.query(query);
it->first = score;
}
The reason I am doing this is so that I can sort them by score later. I have tried maps and received a compilation error about modifying first, but is there perhaps a better way than this to store all the information for sorting later?
To answer your first question, yes. It is perfectly safe to modify the members of your pair, since the actual data in the pair does not affect the vector itself.
edit: I have a feeling that you were getting an error when using a map because you tried to modify the first value of the map's internal pair. That would not be allowed because that value is part of the map's inner workings.
As stated by dribeas:
In maps you cannot change first as it would break the invariant of the map being a sorted balanced tree
edit: To answer your second question, I see nothing at all wrong with the way you are structuring the data, but I would have the database hold pointers to DnaPairT objects, instead of the objects themselves. This would dramatically reduce the amount of memory that gets copied around during the sort procedure.
#include <vector>
#include <utility>
#include <algorithm>
typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT *> DnaDatabaseT;
// ...
// your scoring code, modified to use pointers
void calculateScoresForQuery(DnaDatabaseT& database, queryT& query)
{
for(DnaDatabaseT::iterator it = database.begin(); it != database.end(); it++)
{
int score = (*it)->second.query(query);
(*it)->first = score;
}
}
// custom sorting function to handle DnaPairT pointers
bool sortByScore(DnaPairT * A, DnaPairT * B) { return (A->first < B->first); }
// function to sort the database
void sortDatabaseByScore(DnaDatabaseT& database)
{
sort(database.begin(), database.end(), sortByScore);
}
// main
int main()
{
DnaDatabaseT database;
// code to load the database with DnaPairT pointers ...
calculateScoresForQuery(database, query);
sortDatabaseByScore(database);
// code that uses the sorted database ...
}
The only reason you might need to look into more efficient methods is if your database is so enormous that the sorting loop takes too long to complete. If that is the case, though, I would imagine that your query function would be the one taking up most of the processing time.
You can't modify since the variable first of std::pair is defined const