saving information into hash table or vectors - c++

I want to know which one is faster
hash table or vectors.
if I want to see loop over all the information inside and comparing it to my current data,
If it is already inside, I want to break my loop.
Example:
I have [{1,2},{1,2,3}] and inside the loop my current new data is {1,2} (it is inside my vector or my hash table), so i will break my loop and if i have {2,1} i will break it too.
If all the elements matchs regardless the order I break otherwise i continue my loop. And if a hash table is much faster, Can I have a hint on how i can implement it because im new to C++

Hashtable will work better as you can create key value pair. The only condition is you should not have more than one combination where the key is same. So you cannot have 3,1 and 3,2 in the table as the key is unique.
If you have duplicates on lhs then best to use vector.

I'd use a nested set, that is, a std::set<std::set<int> >.
#include <set>
#include <cassert>
typedef std::set<int> Entry;
typedef std::set<Entry> Table;
int main () {
int e1[] = {1,2};
int e2[] = {1,2,3};
int e3[] = {2,1};
int e4[] = {3,2};
Table t;
t.insert(Entry(e1, e1+2));
t.insert(Entry(e2, e2+3));
Table::iterator it;
Table::iterator end = t.end();;
// Search for 1,2
it = t.find(Entry(e1, e1+2));
// Should find it
assert(it != end);
// Search for 2,1
it = t.find(Entry(e3, e3+2));
// Should find it
assert(it != end);
// Search for 3,2
it = t.find(Entry(e4, e4+2));
// Should NOT find it
assert(it == end);
}

Related

How to merge sorted vectors into a single vector in C++

I have 10,000 vector<pair<unsigned,unsigned>> and I want to merge them into a single vector such that it is lexicographically sorted and does not contain duplicates. In order to do so I wrote the following code. However, to my surprise the below code is taking a lot of time. Can someone please suggest as to how can I reduce the running time of my code?
using obj = pair<unsigned, unsigned>
vector< vector<obj> > vecOfVec; // 10,000 vector<obj>, each sorted with size()=10M
vector<obj> result;
for(auto it=vecOfVec.begin(), l=vecOfVec.end(); it!=l; ++it)
{
// append vectors
result.insert(result.end(),it->begin(),it->end());
// sort result
std::sort(result.begin(), result.end());
// remove duplicates from result
result.erase(std::unique(result.begin(), result.end()), result.end());
}
I think you should use the fact that the vector in vectOfVect are sorted.
So detecting the min value in the front on the single vectors, push_back() it in the result and remove all the values detected from the front of the vectors matching the min values (avoiding duplicates in result).
If you can delete the vecOfVec variable, something like (caution: code not tested: just to give an idea)
while ( vecOfVec.size() )
{
// detect the minimal front value
auto itc = vecOfVec.cbegin();
auto lc = vecOfVec.cend();
auto valMin = itc->front();
while ( ++itc != lc )
valMin = std::min(valMin, itc->front());
// push_back() the minimal front value in result
result.push_back(valMin);
for ( auto it = vecOfVec.begin() ; it != vecOfVec.end() ; )
{
// remove all the front values equals to valMin (this remove the
// duplicates from result)
while ( (false == it->empty()) && (valMin == it->front()) )
it->erase(it->begin());
// when a vector is empty is removed
it = ( it->empty() ? vecOfVec.erase(it) : ++it );
}
}
If you can, I suggest you to switch vecOfVec from a vector< vector<obj> > to something that permit an efficient removal from the front of single containers (stacks?) and an efficient removal of single containers (a list?).
If there are lot of duplicates, you should use set rather than vector for your result, as set is the most natural thing to store something without duplicates:
set< pair<unsigned,unsigned> > resultSet;
for (auto it=vecOfVec.begin(); it!=vecOfVec.end(); ++it)
resultSet.insert(it->begin(), it->end());
If you need to turn it into a vector, you can write
vector< pair<unsigned,unsigned> > resultVec(resultSet.begin(), resultSet.end());
Note that since your code runs over 800 billion elements, it would still take a lot of time, no matter what. At least hours, if not days.
Other ideas are:
recursively merge vectors (10000 -> 5000 -> 2500 -> ... -> 1)
to merge 10000 vectors, store 10000 iterators in a heap structure
One problem with your code is the excessive use of std::sort. Unfortunately, the quicksort algorithm (which usually is the working horse used by std::sort) is not particularly faster when encountering an already sorted array.
Moreover, you're not exploiting the fact that your initial vectors are already sorted. This can be exploited by using a heap of their next values, when you will not need to call sort again. This may be coded as follows (code tested using obj=int), but perhaps it can be made more concise.
// represents the next unused entry in one vector<obj>
template<typename obj>
struct feed
{
typename std::vector<obj>::const_iterator current, end;
feed(std::vector<obj> const&v)
: current(v.begin()), end(v.end()) {}
friend bool operator> (feed const&l, feed const&r)
{ return *(l.current) > *(r.current); }
};
// - returns the smallest element
// - set corresponding feeder to next and re-establish the heap
template<typename obj>
obj get_next(std::vector<feed<obj>>&heap)
{
auto&f = heap[0];
auto x = *(f.current++);
if(f.current == f.end) {
std::pop_heap(heap.begin(),heap.end(),std::greater<feed<obj>>{});
heap.pop_back();
} else
std::make_heap(heap.begin(),heap.end(),std::greater<feed<obj>>{});
return x;
}
template<typename obj>
std::vector<obj> merge(std::vector<std::vector<obj>>const&vecOfvec)
{
// create min heap of feed<obj> and count total number of objects
std::vector<feed<obj>> input;
input.reserve(vecOfvec.size());
size_t num_total = 0;
for(auto const&v:vecOfvec)
if(v.size()) {
num_total += v.size();
input.emplace_back(v);
}
std::make_heap(input.begin(),input.end(),std::greater<feed<obj>>{});
// append values in ascending order, avoiding duplicates
std::vector<obj> result;
result.reserve(num_total);
while(!input.empty()) {
auto x = get_next(input);
result.push_back(x);
while(!input.empty() &&
!(*(input[0].current) > x)) // remove duplicates
get_next(input);
}
return result;
}

Counting number of occurrences in a range within an unordered_map

I have my unordered_map set up as:
unordered_map<int, deque<my_struct>> table;
When I read values to my program, I usually do:
table[int].push_back(obj);
What I want to be able to do is if I'm given 2 integer variables, I want to be able to find the number of keys that occur between the two.
So if in my table I have code like
table[49].push_back(obj);
table[59].push_back(obj);
table[60].push_back(obj);
If I execute my search function(which I'm currently trying to write) to look between the key values of 45 and 65, I should have 3 results.
I'm not exactly sure how to go about it in an efficient manner. Any ideas would be helpful. Than you.
If you are using a std::unordered_map I don't think you have a choice but to loop over all integers 45 to 65 and use find to check if the key exists in the unordered_map:
using my_table = std::unordered_map<int, std::deque<my_struct>>;
int count(const my_table& table, int begin, int end) {
int sum = 0;
for (int i = begin; i != end; ++i) {
auto find_result = table.find(i);
if (find_result != table.end())
sum++;
}
return sum;
}
But this may not be very efficient. If you use a std::map instead the elements are ordered so this can be achieved more efficiently:
using my_table = std::map<int, std::deque<my_struct>>;
int count(const my_table& table, int begin, int end) {
auto begin_itr = table.lower_bound(begin);
if (begin_itr == table.end())
return 0;
auto end_itr = table.lower_bound(end);
return std::distance(begin_itr, end_itr);
}
I've used the std::map::lower_bound function.
Depending on how sparse your map is you might even consider using something like std::vector<std::deque<my_struct>> as a flat map.
Live demo.

Select random element in an unordered_map

I define an unordered_map like this:
std::unordered_map<std::string, Edge> edges;
Is there a efficient way to choose a random Edge from the unordered_map edges ?
Pre-C++11 solution:
std::tr1::unordered_map<std::string, Edge> edges;
std::tr1::unordered_map<std::string, Edge>::iterator random_it = edges.begin();
std::advance(random_it, rand_between(0, edges.size()));
C++11 onward solution:
std::unordered_map<std::string, Edge> edges;
auto random_it = std::next(std::begin(edges), rand_between(0, edges.size()));
The function that selects a valid random number is up to your choice, but be sure it returns a number in range [0 ; edges.size() - 1] when edges is not empty.
The std::next function simply wraps the std::advance function in a way that permits direct assignation.
Is there a efficient way to choose a random Edge from the unordered_map edges ?
If by efficient you mean O(1), then no, it is not possible.
Since the iterators returned by unordered_map::begin / end are ForwardIterators, the approaches that simply use std::advance are O(n) in the number of elements.
If your specific use allows it, you can trade some randomness for efficiency:
You can select a random bucket (that can be accessed in O(1)), and then a random element inside that bucket.
int bucket, bucket_size;
do
{
bucket = rnd(edges.bucket_count());
}
while ( (bucket_size = edges.bucket_size(bucket)) == 0 );
auto element = std::next(edges.begin(bucket), rnd(bucket_size));
Where rnd(n) returns a random number in the [0,n) range.
In practice if you have a decent hash most of the buckets will contain exactly one element, otherwise this function will slightly privilege the elements that are alone in their buckets.
Strict O(1) solution without buckets:
Keep a vector of keys, when you need to get a random element from your map, select a random key from the vector and return corresponding value from the map - takes constant time
If you insert a key-value pair into your map, check if such key is already present, and if it's not the case, add that key to your key vector - takes constant time
If you want to remove an element from the map after it was selected, swap the key you selected with the back() element of your key vector and call pop_back(), after that erase the element from the map and return the value - takes constant time
However, there is a limitation: if you want to delete elements from the map aside from random picking, you need to fix your key vector, this takes O(n) with naive approach. But still there is a way to get O(1) performance: keep a map that tells you where the key is in the key vector and update it with swap :)
This is how you can get random element from a map:
std::unordered_map<std::string, Edge> edges;
iterator item = edges.begin();
int random_index = rand() % edges.size();
std::advance(item, random_index);
Or take a look at this answer, which provides the following solution:
std::unordered_map<std::string, Edge> edges;
iterator item = edges.begin();
std::advance( item, random_0_to_n(edges.size()) );
The solution of
std::unordered_map<std::string, Edge> edges;
auto random_it = std::next(std::begin(edges), rand_between(0, edges.size()));
is extremely slow....
A much faster solution will be:
when assigning edges, simutaneously emplaces its keys to std::vector<std::string> vec
random an int index ranging from 0 to vec.size() - 1
then get edges[vec[index]]
you can see this problem:
problem 380. Insert Delete GetRandom O(1)
you can build a vector to use vector random iterators, get random values more efficiently. Like this:
class RandomizedSet {
public:
unordered_map<int, int> m;
vector<int> data;
RandomizedSet() {
}
bool insert(int val) {
if(m.count(val)){
return false;
} else{
int index = data.size();
data.push_back(val);
m[val] = index;
return true;
}
}
bool remove(int val) {
if(m.count(val)){
int curr_index = m[val];
int max_index = data.size()-1;
m[data[max_index]] = curr_index;
swap(data[curr_index], data[max_index]);
data.pop_back();
m.erase(val);
return true;
} else{
return false;
}
}
int getRandom() {
return data[rand() % data.size()];
}
};
/**
* Your RandomizedSet object will be instantiated and called as such:
* RandomizedSet* obj = new RandomizedSet();
* bool param_1 = obj->insert(val);
* bool param_2 = obj->remove(val);
* int param_3 = obj->getRandom();
*/

the value of iterator

i created a map.
i want to print the index of the key to a file using the itr in the map.
this is what i mean:
map <string,int> VendorList;
VendorList[abc] = 0;
VendorList[mazda] = 111;
VendorList[ford] = 222;
VendorList[zoo] = 444;
map <string,int>::iterator itr=VendorList.find("ford");
fstream textfile;
textfile << itr;
if i put in the find line abc i wish the program to cout 1.
if i put in the find line mazda i wish the program to cout 2.
if i put in the find line ford i wish the program to cout 3.
if i put in the find line zoo i wish the program to cout 4.
how do i do that?
the compiler is shouting on the line:
textfile << itr;
it gives this error:
error C2679: binary '<<' : no operator found which takes a right-hand operand of type 'std::_Tree<_Traits>::iterator' (or there is no acceptable conversion)
Your program has many bugs. Frankly speaking I am not sure about your requirement.
But anyways try this :
map <string,int> VendorList;
VendorList["abc"] = 1;
VendorList["mazda"] = 2;
VendorList["ford"] = 3;
VendorList["zoo"] = 4;
map <string,int>::iterator itr=VendorList.find("ford");
cout<<(itr->second);// this will print 3
EDIT :
Also as somebody has suggested to use vector of pairs,I think he is right. Try something like this.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
typedef vector<pair<string,int> > Vm;
Vm V;
V.push_back(make_pair("abc",0));
V.push_back(make_pair("mazda",111));
V.push_back(make_pair("ford",222));
V.push_back(make_pair("zoo",444));
for(size_t i=0;i!=V.size();++i)
if(V[i].first=="ford")
cout<<(i+1);
}
Modify the above program as per requirement.
Hope that helps.
In map, the elements aren't stored in the order of insertion, so you have to hold the "order" data yourself.
I would suggest you to consider using a vector of pairs instead of a map. Vector does store the elements in the order of insertion, and its iterator is Random-Access so you will be able to check the position using the operator-.
vector <pair<string, int> >::iterator itr;
// itr = the needed element
cout << itr - VendorList.begin();
As such, the concept of 'index' doesn't really fit with Maps.
Maps are just key-value pairs where you store a value (say, '111') and access it using a key (say 'mazda'). In this way you don't really need an index in order to access '111', you can just use the key 'mazda'.
If you do want your application to be index based however, consider using a different data structure like a Vector or a Linked List.

Union with map?

I am trying to find the union of two sets with map. I have two sets and would like to combine them into a third one. I get an error for this code in the push_back. Is there a way to do this?
map<char, vector<char> > numbers;
map<char, vector<char> >::iterator it;
numbers['E'].push_back('a');//set1
numbers['E'].push_back('b');
numbers['E'].push_back('c');
numbers['G'].push_back('d');//set2
numbers['G'].push_back('e');
void Create::Union(char set1, char set2, char set3)
{
for (it = numbers.begin(); it != numbers.end(); ++it)
{
numbers[set3].push_back(it->second);
}
}
numbers is a load of vectors, keyed by character. So it->second is a vector. You can't push_back a vector into a vector of char.
You should be iterating over numbers[set1] and numbers[set2], not iterating over numbers. Or as bdonlan says, you could insert a range, although he's taking a union of everything in numbers, not just set1 and set2.
Also: where's item defined? Do you mean it?
Also, note that push_back doesn't check whether the value is in the vector already. So once you get the details of this general approach sorted out, your example case will work and the union of 'E' and 'G' will be a vector containing 'a','b','c','d','e'. But if you took the union of 'a','b','c' with 'c','d','e' you'd get 'a','b','c','c','d','e', which probably isn't what you want from a union.
Assuming your vectors are always going to be sorted, you could instead use the standard algorithm set_union:
#include <algorithm>
#include <iterator>
...
numbers[set3].clear();
std::set_union(numbers[set1].begin(), numbers[set1].end(),
numbers[set2].begin(), numbers[set2].end(),
std::back_inserter(numbers[set3]));
If you want to take the union of everything in numbers, I would probably go with either:
vector<char> sofar;
map<char, vector<char> >::iterator it;
for (it = numbers.begin(); it != numbers.end(); ++it) {
// new, empty vector
vector<char> target;
// merge everything so far with the next item from the map,
// putting the results in target
set_union(sofar.begin(), sofar.end(),
it->second.begin(), it->second.end(),
back_inserter(target));
// the result is the new "everything so far"
// note that this operation is very fast. It doesn't have to
// copy any of the contents of the vector, just exchange some pointers.
swap(target, sofar);
}
// replace numbers[set3] with the final result
swap(numbers[set3], sofar);
Or:
set<char> sofar;
map<char, vector<char> >::iterator it;
for (it = numbers.begin(); it != numbers.end(); ++it) {
// let std::set remove the duplicates for us
sofar.insert(it->second.begin(), it->second.end());
}
// replace numbers[set3] with the final result
numbers[set3].clear();
numbers[set3].insert(numbers[set3].end(), sofar.begin(), sofar.end());
This is less code and might be faster, or might thrash the memory allocator too much. Not sure which is better, and for small collections performance almost certainly doesn't matter at all.
The version with set also doesn't require the vectors to be sorted, although it's faster if they are.
I think you might want:
void Create::Union(char set1, char set2, char set3)
{
vector<char> &target = numbers[set3];
for (it = numbers.begin(); it != numbers.end(); ++it)
{
if (&it->second == &target)
continue; // Don't insert into ourselves
target.insert(target.end(), it->second.begin(), it->second.end());
}
}
push_back was trying to add the item->second vector itself to the target vector; this way explicitly copies the contents only.