C++ design issue with unordered_set and iterators - c++

I have the following snippet
template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
std::hash<T> hasher;
seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
const size_t INF(numeric_limits<size_t>::max());
class nodehasher;
class node{
public:
int x,y;
unordered_set<node, nodehasher>::iterator neighbs[6]; //Issue Here
node(){}
node(int x_, int y_):x(x_),y(y_){}
void set(int x_, int y_){x = x_,y = y_;}
bool operator == (const node &n)const{
return x == n.x && y == n.y;
}
};
class nodehasher{
std::size_t operator()(node const& n) const{
std::size_t seed = 0;
hash_combine(seed, n.x);
hash_combine(seed, n.y);
return seed;
}
};
I seem to be having issues declaring the iterators pointing to class node inside node itself.
This causes a huge number of too verbose errors.
Now i realize i could make my neighbs array, an array of pointers to node,
but i want to avoid pointers for obvious reasons
A typical simplified way i use this would be:
unordered_set<node, nodehasher> nodes;
void typical_use(node dest){
auto src_node = node(0,0);
int neighbcount = 0;
auto iter = dest.insert(node).first;
src_node.neighbs[neighb_count] = iter;
}
I could obviously convert it into pointers and do:
src_node.neighbs[neighb_count] = &(*iter);
But Is there no way to avoid pointers for what i want to do?
EDIT:
As many of the comments and answers have pointed out both pointers and iterators to the container elements get invalidated after a rehash
so it is a bad idea storing them.
I was thinking if the following way works instead of an unordered_set of node i will use an unordered_set of pointer to nodes, something like this
unordered_set<shared_ptr<node> > nodes;
Also, if i know that the number of nodes is always going to be less than 500, i could forgo this whole hash table idea and use an array and each time i will have to search the array to check if the node is already there.
Can you please point out which approach is better?

Standard containers require complete types for values. node isn't complete type at the point where you use it to instantiate unordered_set<node, nodehasher>.
You could use Boost.Container, because they allow incomplete types, but I don't see hashed containers there (so you'd have to use set).
You should be careful with storing iterators, though, because, at least for unordered_ containers from the standard library, they may be invalidated upon
rehashing. References (and pointers) are not invalidated.

Related

Create a vector of pairs, where the second element of a pair points to the next pair in the vector

I need to create a vector or similar list of pairs, where the first element of a pair is of class T, and the second element is a pointer to the next pair.
Illustration
template<class T>
std::vector<std::pair<T, T*>> createPointingVector(std::vector<T> vec) {
std::vector<std::pair<T, T*>> new_vec;
for (int i=0; i<vec.size(); i++){
new_vec.push_back(std::make_pair(vec[i], &(vec[i - 1])));
}
return new_vec;
}
I understand that std::vector<std::pair<T, T*>> is incorrect because the second element of the pair is not supposed to be of type *T but rather a recursive *std::pair<T, *std::pair<T, *std::pair<T, ...>>>.
Is it possible to fix my code or what are the other ways to achieve my goal, if any?
I strongly recommend rethinking using a bare vector.
My reason for that is that you need to guarantee that that the memory of the vector is never reallocated. Note that you also should in any case make sure that your vector is made sure to allocate all required memory from the start, either by initializing with empty elements or by using std::vector::reserve.
Otherwise, if you have a pointer already set and then change the capacity of the vector, the pointer becomes invalid, a good setup if you want undefined behaviour.
Therefore I strongly advise you to use a wrapper class around your vector, which makes sure no capacity change is ever called.
Now, if you do that, the thing is, why do you use actual pointers?
Consider using data of type std::vector<std::pair<T, size_t> >, with the second entry actually storing the position within the vector, rather than an actual pointer:
template<class T>
class PointingVector
{
public:
PointingVector(const std::vector<T>& vec);
private:
std::vector<std::pair<T, size_t> > data;
};
template<class T>
PointingVector<T>::PointingVector(const std::vector<T>& vec)
{
for (int i=0; i<vec.size()-1; i++)
{
data.push_back(std::make_pair(vec[i], i+1));
}
data.push_back(std::make_pair(vec.back(), 0)); // assuming that the last points to the first
}
After that, make sure that every additional method you add leaves the pointing consistent. Like should you write something similar to erase, make sure that all pairs are updated accordingly.
And the analogy to dereferencing is trivial:
template<class T>
std::pair<T, size_t>& PointingVector<T>::get(size_t index)
{
return data[index];
}
The important thing about my solution is that you can exclude possible bugs in regard to dangling pointers. Those are really bad, especially since they might not cause an error in test executions, given the nature of undefined behaviour. Worst thing in my solution is that the indices are wrong after calling a method that has a bug.
And if you want to introduce anything that changes the capacity of the vector, no problem, no need to redo any pointers. Just make sure the indices are changed accordingly. If you did this with pointers, your first step would probably be to create a list of indices anyway, so why not work with one directly.
Plus, as this solution has no (visible) pointers at all, you don't need to do any memory management.
Another solution: Ditch std::pair and define your own type:
template<class T>
struct Node
{
T data;
Node* next; // or a smart pointer type
Node(const T& data, Node* next) : data(data), next(next) {}
};
Then build up your vector like this:
template<class T>
std::vector<Node<T>*> createPointingVector(const std::vector<T>& vec)
{
std::vector<Node<T>*> new_vec;
for (int i=0; i<vec.size(); i++)
{
new_vec.push_back(new Node<T>(vec[i], nullptr));
}
for (int i=0; i<vec.size()-1; i++)
{
new_vec[i]->next = new_vec[i+1];
}
new_vec[vec.size()-1]->next = new_vec[0];
return new_vec;
}
Note that without smart pointers, you need to do memory management. I'd consider making next a weak_ptr<Node>, and have the vector be over shared_ptr<Node>. That way, the memory is automatically deallocated as soon as the vector gets deleted (assuming you have no other pointers active).
What you ask is doable, but according to the illustration found linked within your answer, the pointers should point one-up circularly inside the input vector, and not one-down, as is in your code example. What I mean is:
new_vec[0] = {vec[0], &vec[1]}
new_vec[1] = {vec[1], &vec[2]}
...
new_vec[N-1] = {vec[N-1], &vec[0]}
above, N = vec.size().
I attach a minimum working example:
#include <iostream>
#include <vector>
#include <utility> // std::pair, std::make_pair
template<class T>
std::vector<std::pair<T, T*> > createPointingVector(std::vector<T>& vec) { // important: make the parameter a reference
std::vector<std::pair<T, T*> > new_vec;
int vec_size = vec.size();
for (int i = 0; i < vec_size-1; i++)
new_vec.push_back( std::make_pair( vec[i], &(vec[i + 1]) ) ); // pointers assigned according to linked picture
new_vec.push_back( std::make_pair( vec[vec_size-1], &vec[0] ) );
return new_vec;
}
int main()
{
std::vector<int> input = {1,2,3,4};
std::vector<std::pair<int,int*> > sol = createPointingVector(input);
for (auto i : sol)
std::cout << i.first << " -> " << *(i.second) << std::endl;
return 0;
}

How to fill a stl::map by iterating to maximum value?

I have an std::map, and I use the following method for filling up to the maximum value of the supplied data type. In this case, if K is int then the maximum value is 2,147,483,647. I want my map to have 2,147,483,647 keys with the same value.
The below loop is very inefficient. Is there any method to reduce the time consumption?
for (auto i = keyBegin; i!=numeric_limits<K>::max(); i++) {
m_map[i] = val;
}
The problem with the code above is that you're inserting 2 billion numbers, all at the end of the map. But operator[] has no idea that you'll be inserting a new item there!
std::map::insert(hint_before, value) is what you need. You've got a perfect hint - all values will be inserted directly before m_map.end()
To supplement the existing answers, this is really not a good use of std::map.
Maps are designed for quick lookup in a collection of keys & values, where the keys are "sparse". They're generally implemented as trees, requiring lots of dynamic allocation, tree rebalancing, and the sacrifice of cache locality. This is worth it for the general map use case.
But your keys are far from sparse! You literally have a value for every possible number in the key's type's range. This is what arrays are for.
Use an array and you will benefit from cache, you will benefit from constant-time lookups, and you will not need any dynamic allocation inside the container. You will of course need to dynamically allocate the container itself because it is huge, so, you're looking for std::vector.
And that's only if you really need to precalculate all these values. If you don't necessarily need them all multiple times, consider generating them on-demand instead. Because, regardless how much RAM a contemporary PC can provide you, this feels a bit like an abuse of the technology.
To supplement MSalters's answer, that suggests to use map::insert(hint, {key, value}) construct, I suggest to use a non-default allocator. A specialized allocater can further speed up the insertion twofold. Consider the following trivial allocator:
template <class T>
class chunk_allocator
{
private:
struct node
{
union
{
node *next;
char element[sizeof(T)];
};
};
public:
using value_type = T;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
using is_always_equal = std::false_type;
using propagate_on_container_move_assignment = std::true_type;
T*allocate(std::size_t n)
{
if (n > 1)
return reinterpret_cast<T*>(::operator new(sizeof(T) * n));
if (!m_free_head) populate();
node * head = m_free_head;
m_free_head = head->next;
using node_ptr = node*;
head->next.~node_ptr();
return reinterpret_cast<T*>(&head->element);
}
void deallocate(T* p, std::size_t n) {
if (!p)
return;
if (n > 1) {
::operator delete((void*)p);
return;
}
node * new_head = new ((void*)p) node;
new_head->next = m_free_head;
m_free_head = new_head->next;
}
private:
static constexpr unsigned NODES_IN_CHUNK = 1000;
void populate()
{
if (m_free_head) return;
m_chunks.emplace_back();
for (node & entry : m_chunks.back()) {
entry.next = m_free_head;
m_free_head = &entry;
}
}
std::list<std::array<node, NODES_IN_CHUNK>> m_chunks;
node * m_free_head = nullptr;
};
template< class T1, class T2 >
bool operator==( const chunk_allocator<T1>& a, const chunk_allocator<T2>& b )
{
return (void*)&a == (void*)&b;
}
template< class T1, class T2 >
bool operator!=( const chunk_allocator<T1>& a, const chunk_allocator<T2>& b )
{
return !(a == b);
}
And its use:
std::map<int, int, std::less<int>,
chunk_allocator<std::pair<const int, int >>> m_map;
Working 100,000,000 elements takes (on windows with cygwin):
std::allocator: Insertion: 7.987, destruction: 7.238
chunk_allocator: Insertion: 2.745, destruction: 1.357
On Linux the differences are not that big, but still 2x improvements are possible.
Bonus points - the chunk_allocator takes less memory, since it does not use operator new for individual map nodes. Each call to operator new has to maintain memory management bookkeeping, which chunk_allocator does not have to.

hash function for a vector of pair<int, int>

I'm trying to implement an unordered_map for a vector< pair < int,int> >. Since there's no such default hash function, I tried to imagine a function of my own :
struct ObjectHasher
{
std::size_t operator()(const Object& k) const
{
std::string h_string("");
for (auto i = k.vec.begin(); i != k.vec.end(); ++i)
{
h_string.push_back(97+i->first);
h_string.push_back(47); // '-'
h_string.push_back(97+i->second);
h_string.push_back(43); // '+'
}
return std::hash<std::string>()(h_string);
}
};
The main idea is to change the list of integers, say ( (97, 98), (105, 107) ) into a formatted string like "a-b+i-k" and to compute its hash thanks to hash < string >(). I choosed the 97, 48 and 43 numbers only to allow the hash string to be easily displayed in a terminal during my tests.
I know this kind of function might be a very naive idea since a good hash function should be fast and strong against collisions. Well, if the integers given to push_back() are greater than 255 I don't know what might happen... So, what do you think of the following questions :
(1) is my function ok for big integers ?
(2) is my function ok for all environments/platforms ?
(3) is my function too slow to be a hash function ?
(4) ... do you have anything better ?
All you need is a function to "hash in" an integer. You can steal such a function from boost:
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= std::hash<T>(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
Now your function is trivial:
struct ObjectHasher
{
std::size_t operator()(const Object& k) const
{
std::size_t hash = 0;
for (auto i = k.vec.begin(); i != k.vec.end(); ++i)
{
hash_combine(hash, i->first);
hash_combine(hash, i->second);
}
return hash;
}
};
This function is is probably very slow compared to other hash functions since it uses dynamic memory allocation. Also std::hash<std::string> Is not a very good hash function since it is very general. It's probably better to XOR all ints and use std::hash<int>.
This is a perfectly valid solution. All a hash function needs is a sequence of bytes and by concatenating your elements together as a string you are providing a unique byte representation of the map.
Of course this could become unruly if your map contains a large number of items.

Returning container from function: optimizing speed and modern style

Not entirely a question, although just something I have been pondering on how to write such code more elegantly by style and at the same time fully making use of the new c++ standard etc. Here is the example
Returning Fibonacci sequence to a container upto N values (for those not mathematically inclined, this is just adding the previous two values with the first two values equal to 1. i.e. 1,1,2,3,5,8,13, ...)
example run from main:
std::vector<double> vec;
running_fibonacci_seq(vec,30000000);
1)
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
}
2) the same but using rvalue && instead of & 1.e.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
EDIT: as noticed by the users who commented below, the rvalue and lvalue play no role in timing - the speeds were actually the same for reasons discussed in the comments
results for N = 30,000,000
Time taken for &:919.053ms
Time taken for &&: 800.046ms
Firstly I know this really isn't a question as such, but which of these or which is best modern c++ code? with the rvalue reference (&&) it appears that move semantics are in place and no unnecessary copies are being made which makes a small improvement on time (important for me due to future real-time application development). some specific ''questions'' are
a) passing a container (which was vector in my example) to a function as a parameter is NOT an elegant solution on how rvalue should really be used. is this fact true? if so how would rvalue really show it's light in the above example?
b) coll.resize(N); call and the N=1 case, is there a way to avoid these calls so the user is given a simple interface to only use the function without creating size of vector dynamically. Can template metaprogramming be of use here so the vector is allocated with a particular size at compile time? (i.e. running_fibonacci_seq<30000000>) since the numbers can be large is there any need to use template metaprogramming if so can we use this (link) also
c) Is there an even more elegant method? I have a feeling std::transform function could be used by using lambdas e.g.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
coll[1] = 1;
std::transform (coll.begin()+2,
coll.end(), // source
coll.begin(), // destination
[????](????) { // lambda as function object
return ????????;
});
}
[1] http://cpptruths.blogspot.co.uk/2011/07/want-speed-use-constexpr-meta.html
Due to "reference collapsing" this code does NOT use an rvalue reference, or move anything:
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T&& coll, const INT_TYPE& N);
running_fibonacci_seq(vec,30000000);
All of your questions (and the existing comments) become quite meaningless when you recognize this.
Obvious answer:
std::vector<double> running_fibonacci_seq(uint32_t N);
Why ?
Because of const-ness:
std::vector<double> const result = running_fibonacci_seq(....);
Because of easier invariants:
void running_fibonacci_seq(std::vector<double>& t, uint32_t N) {
// Oh, forgot to clear "t"!
t.push_back(1);
...
}
But what of speed ?
There is an optimization called Return Value Optimization that allows the compiler to omit the copy (and build the result directly in the caller's variable) in a number of cases. It is specifically allowed by the C++ Standard even when the copy/move constructors have side effects.
So, why passing "out" parameters ?
you can only have one return value (sigh)
you may wish the reuse the allocated resources (here the memory buffer of t)
Profile this:
#include <vector>
#include <cstddef>
#include <type_traits>
template <typename Container>
Container generate_fibbonacci_sequence(std::size_t N)
{
Container coll;
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
return coll;
}
struct fibbo_maker {
std::size_t N;
fibbo_maker(std::size_t n):N(n) {}
template<typename Container>
operator Container() const {
typedef typename std::remove_reference<Container>::type NRContainer;
typedef typename std::decay<NRContainer>::type VContainer;
return generate_fibbonacci_sequence<VContainer>(N);
}
};
fibbo_maker make_fibbonacci_sequence( std::size_t N ) {
return fibbo_maker(N);
}
int main() {
std::vector<double> tmp = make_fibbonacci_sequence(30000000);
}
the fibbo_maker stuff is just me being clever. But it lets me deduce the type of fibbo sequence you want without you having to repeat it.

C++: joining array together - is it possible with pointers WITHOUT copying?

as in the title is it possible to join a number of arrays together without copying and only using pointers? I'm spending a significant amount of computation time copying smaller arrays into larger ones.
note I can't used vectors since umfpack (some matrix solving library) does not allow me to or i don't know how.
As an example:
int n = 5;
// dynamically allocate array with use of pointer
int *a = new int[n];
// define array pointed by *a as [1 2 3 4 5]
for(int i=0;i<n;i++) {
a[i]=i+1;
}
// pointer to array of pointers ??? --> this does not work
int *large_a = new int[4];
for(int i=0;i<4;i++) {
large_a[i] = a;
}
Note: There is already a simple solution I know and that is just to iteratively copy them to a new large array, but would be nice to know if there is no need to copy repeated blocks that are stored throughout the duration of the program. I'm in a learning curve atm.
thanks for reading everyone
as in the title is it possible to join a number of arrays together without copying and only using pointers?
In short, no.
A pointer is simply an address into memory - like a street address. You can't move two houses next to each other, just by copying their addresses around. Nor can you move two houses together by changing their addresses. Changing the address doesn't move the house, it points to a new house.
note I can't used vectors since umfpack (some matrix solving library) does not allow me to or i don't know how.
In most cases, you can pass the address of the first element of a std::vector when an array is expected.
std::vector a = {0, 1, 2}; // C++0x initialization
void c_fn_call(int*);
c_fn_call(&a[0]);
This works because vector guarantees that the storage for its contents is always contiguous.
However, when you insert or erase an element from a vector, it invalidates pointers and iterators that came from it. Any pointers you might have gotten from taking an element's address no longer point to the vector, if the storage that it has allocated must change size.
No. The memory of two arrays are not necessarily contiguous so there is no way to join them without copying. And array elements must be in contiguous memory...or pointer access would not be possible.
I'd probably use memcpy/memmove, which is still going to be copying the memory around, but at least it's been optimized and tested by your compiler vendor.
Of course, the "real" C++ way of doing it would be to use standard containers and iterators. If you've got memory scattered all over the place like this, it sounds like a better idea to me to use a linked list, unless you are going to do a lot of random access operations.
Also, keep in mind that if you use pointers and dynamically allocated arrays instead of standard containers, it's a lot easier to cause memory leaks and other problems. I know sometimes you don't have a choice, but just saying.
If you want to join arrays without copying the elements and at the same time you want to access the elements using subscript operator i.e [], then that isn't possible without writing a class which encapsulates all such functionalities.
I wrote the following class with minimal consideration, but it demonstrates the basic idea, which you can further edit if you want it to have functionalities which it's not currently having. There should be few error also, which I didn't write, just to make it look shorter, but I believe you will understand the code, and handle error cases accordingly.
template<typename T>
class joinable_array
{
std::vector<T*> m_data;
std::vector<size_t> m_size;
size_t m_allsize;
public:
joinable_array() : m_allsize() { }
joinable_array(T *a, size_t len) : m_allsize() { join(a,len);}
void join(T *a, size_t len)
{
m_data.push_back(a);
m_size.push_back(len);
m_allsize += len;
}
T & operator[](size_t i)
{
index ix = get_index(i);
return m_data[ix.v][ix.i];
}
const T & operator[](size_t i) const
{
index ix = get_index(i);
return m_data[ix.v][ix.i];
}
size_t size() const { return m_allsize; }
private:
struct index
{
size_t v;
size_t i;
};
index get_index(size_t i) const
{
index ix = { 0, i};
for(auto it = m_size.begin(); it != m_size.end(); it++)
{
if ( ix.i >= *it ) { ix.i -= *it; ix.v++; }
else break;
}
return ix;
}
};
And here is one test code:
#define alen(a) sizeof(a)/sizeof(*a)
int main() {
int a[] = {1,2,3,4,5,6};
int b[] = {11,12,13,14,15,16,17,18};
joinable_array<int> arr(a,alen(a));
arr.join(b, alen(b));
arr.join(a, alen(a)); //join it again!
for(size_t i = 0 ; i < arr.size() ; i++ )
std::cout << arr[i] << " ";
}
Output:
1 2 3 4 5 6 11 12 13 14 15 16 17 18 1 2 3 4 5 6
Online demo : http://ideone.com/VRSJI
Here's how to do it properly:
template<class T, class K1, class K2>
class JoinArray {
JoinArray(K1 &k1, K2 &k2) : k1(k1), k2(k2) { }
T operator[](int i) const { int s = k1.size(); if (i < s) return k1.operator[](i); else return k2.operator[](i-s); }
int size() const { return k1.size() + k2.size(); }
private:
K1 &k1;
K2 &k2;
};
template<class T, class K1, class K2>
JoinArray<T,K1,K2> join(K1 &k1, K2 &k2) { return JoinArray<T,K1,K2>(k1,k2); }
template<class T>
class NativeArray
{
NativeArray(T *ptr, int size) : ptr(ptr), size(size) { }
T operator[](int i) const { return ptr[i]; }
int size() const { return size; }
private:
T *ptr;
int size;
};
int main() {
int array[2] = { 0,1 };
int array2[2] = { 2,3 };
NativeArray<int> na(array, 2);
NativeArray<int> na2(array2, 2);
auto joinarray = join(na,na2);
}
A variable that is a pointer to a pointer must be declared as such.
This is done by placing an additional asterik in front of its name.
Hence, int **large_a = new int*[4]; Your large_a goes and find a pointer, while you've defined it as a pointer to an int. It should be defined (declared) as a pointer to a pointer variable. Just as int **large_a; could be enough.