Access the element of an unordered_set with a hash

Access the element of an unordered_set with a hash - c++

template<class T>
struct handle{
...
std::size_t index;
...
};
template<class T>
struct foo{
...
struct eq {
bool operator()(const std::shared_ptr<handle<T>> &a,
const std::shared_ptr<handle<T>> &b) const
{
return a->index == b->index;
}
};
struct hash {
std::size_t operator()(const std::shared_ptr<handle<T>> &a) const
{
return a->index;
}
};
std::unordered_set <std::shared_ptr<handle<T>>, hash, eq> handle_set;
...
};
The handle_set is a view into some std::vector<T>. I basically want to check if someone has a reference to an element in that vector like this
std::size_t index_from_vector = 5;
if(handle_set.count(index_from_vector)){
//handle exisits
}
But this doesn't work because an unordered_set needs the key type, so I would have to do it like this
auto sp = std::make_shared<handle<T>>(..,index_from_vector,..);
if(handle_set.count(sp)){
//handle exisits
}
That means I always would have to create a dummy shared_ptr if I want to check if there is a handle to a specific element in a vector.
Is there a way to access an unordered_set with only the hash?
I currently use a unordered_map for this
std::unordered_map<std::size_t, std::shared_ptr<handle<T>>> handle_map;
But updating handle_map is a bit of a pain because I would need to update handle.index and the key. This gets a bit awkward and an unordered_set would greatly simplify this.
Maybe there is even another data structure that would fit better?

Related

using a map with a comparator as a std::map parameter

Say I define a map with a custom comparator such as
struct Obj
{
int id;
std::string data;
std::vector<std::string> moreData;
};
struct Comparator
{
using is_transparent = std::true_type;
bool operator()(Obj const& obj1, Obj const& obj2) { return obj1.id < obj2.id; };
}
std::map<Obj,int,Comparator> compMap;
is there a good way to ensure that downstream users don't have to implement the comparator to use the map as a map?
for instance my compiler throws an error if I try to pass it to a function with a similar type.
template<class T>
inline void add(std::map<T, int>& theMap, T const & keyObj)
{
auto IT = theMap.find(keyObj);
if (IT != theMap.end())
IT->second++;
else
theMap[keyObj] = 1;
}
add(compMap,newObj); //type error here
EDIT:
I kinda over santitized this to make a generic case. and then overlooked the obvious
template<class T, class Comp, class Alloc>
inline void add(std::map<T, int, Comp, Alloc>& theMap, T const & keyObj)
still having issues with one use not being able to deduce T, but went from 80 erros to 1 so... progress
thanks everyone.

You can typedef the specialised type and use that type inplace of
std::map<...
typedef std::map<Obj,int,Comparator> compMap_t;
inline void add(compMap_t& theMap, Obj const & keyObj)
...

Downstream users either use the type declared by you
using my_important_map = std::map<Obj,int,Comparator>;
or better use functions which take a generic map type,
auto some_function(auto const& map_)
{
//do something with the map and don't care about the ordering
return map_.find(Obj(1));
}

Safe way to use string_view as key in unordered map

My type Val contains std::string thekey.
struct Val
{
std::string thekey;
float somedata;
}
I would like put my type in an unordered map, with thekey as key. For memory and conversion avoidance reasons I would like to have std::string_view as key type. Is it possible to have the key created to point to val.thekey, while using unique_ptr ?
std::unique_ptr<Val> valptr = ...;
std::unordered_map<std::string_view,std::unique_ptr<Val>> themap;
themap[std::string_view(valptr->thekey)] = std::move(valptr); // is this ok and safe?

Safe way to use string_view as key in unordered map
In general there isn't one, because the storage underlying the view might change at any time, invalidating your map invariants.
Associative containers generally own a const key precisely to avoid this.
In your specific case it makes much more sense to use std::unordered_set<Val, ValKeyHash, ValKeyEqual> with suitable hash and equality functors.
Edit, these suitable functors are simply
struct ValKeyHash {
std::size_t operator() (Val const &v)
{
return std::hash<std::string>{}(v.thekey);
}
};
struct ValKeyEqual {
bool operator() (Val const& a, Val const& b)
{
return a.thekey == b.thekey;
}
};
Obviously this leaves us with the slightly unhappy requirement of using a temporary Val{key, dummy_data} for lookups, at least until we can use the C++20 transparent/projected version in the other answer.

In c++20, you should do this
namespace utils {
// adl hash function:
template<class T>
auto hash( T const& t )
->decltype( std::hash<T>{}(t) )
{ return std::hash<T>{}(t); }
// Projected hasher:
template<class Proj>
struct ProjHash {
template<class T>
constexpr std::size_t operator()(T const& t)const {
return hash(Proj{}(t));
}
using is_transparent=std::true_type;
};
// Projected equality:
template<class Proj>
struct ProjEquals {
template<class T, class U>
constexpr std::size_t operator()(T const& t, U const& u)const {
return std::equal_to<>{}( Proj{}(t), Proj{}(u) );
}
using is_transparent=std::true_type;
};
}
// A projection from Val to a string view, or a string view
// to a string view:
struct KeyProj {
std::string_view operator()(Val const& val) const { return val.thekey; }
std::string_view operator()(std::string_view sv) const { return sv; }
};
std::unordered_set<Val, ProjHash<KeyProj>, ProjEquals<KeyProj>> theset;
now you can
theset.find("hello")
to find the element of the set whose key is "hello".
A map is fundamentally wrong here, because the features that a map has that the above set does not don't do the right things. Like mymap["hello"], which goes and creates a Val if it isn't found; we now have a dangling string view in the container.
An intrusive map in std is a set with a projection, not a map with a reference into the value as a key.

Implementing a Type-erased list for fast write/read

I am trying to implement a type erased data structure for writing and
reading large arrays of any type in a list, with the following
requirements:
Fast insert of bulk data (receive a std::vector<T>, where T is a primitive type).
Fast read of all/latest values if types match
Read/convert if types mismatch. In most cases from primitive to primitive (e.g double->float, int->double)
The interface I was thinking of would look something like this:
class Trace {
template<typename T> std::vector<T> read();
template<typename T> std::vector<T> latest();
template<typename T> void append(std::vector<T> values);
template<typename T> void replace(std::vector<T> values);
void clear();
};
Which is then used in a TraceHandler class (Singleton structure), which allows access to traces per key:
class TraceHandler {
public:
template<typename T>
std::vector<T> read(std::string const& key);
template<typename T>
void write(std::string const& key, std::vector<T> const& val);
private:
// STore all Traces for different types
};
And a useage would look something like this:
TraceHandler th;
std::vector<double> vals(1000,1.0), res;
th.write("values",vals);
std::vector<int> resInt;
res = th.read<double>("values");
resInt = th.read<int>("values");
Our current implementation creates a Trace for each datatype and the
user has to keep track of the correct type, which is not very
flexible (e.g write using writeDouble(), read using readDouble).
My first approach was to change the type of the internal storage
vector to an any type (we are using Poco libraries, so I was using
Poco::Any and Poco::DynamicAny), but this leads to a big
performance hit.
Data is written from Devices with high frequencies (data is acquired
with up to 20khz, then written in blocks of around 4k to the Trace),
and the measured performance difference between a plain vector and one
of an Any type was of factor 500-1000 (measured 800ms vs. 4ms for big
bulk insert/read in a loop). Most of the time gets lost due to
constructor calls vs simple memcopy.
So my question is: Is there a way to implement this interface (or an
alternative) with good bulk insert/read performance?
Edit:
This is the current implementation I'm using:
class LWDynamicVector
{
private:
typedef std::vector<Poco::DynamicAny> DataList;
DataList m_data;
public:
LWDynamicVector() {}
template<typename T> std::vector<T> read() {
return std::vector<T>(m_data.begin(),m_data.end());
}
template<typename T> void writeAppend(std::vector<T> v) {
m_data.insert(m_data.end(),v.begin(),v.end());
}
template<typename T> void writeReplace(std::vector<T> v) {
m_data.assign(v.begin(),v.end());
}
};
And the Test I am using:
TEST(DynamicVector,Performance) {
typedef double Type;
size_t runs = 100; size_t N = 20480;
std::vector<Type> input;
input.reserve(N);
for(size_t i = 0; i < N; ++i) {
input.push_back(rand());
}
{
OldVector<Type> instance;
Poco::Timestamp ts;
for(size_t i = 0; i < runs; ++i) {
instance.writeAppend(input);
}
std::cout << "Old vector: time elapsed(ms) = " << ts.elapsed() / 1000.0 << std::endl;
std::vector<Type> output = instance.read();
EXPECT_EQ(output.back(),output.back());
}
{
LWDynamicVector dbv;
Poco::Timestamp ts;
for(size_t i = 0; i < runs; ++i) {
dbv.writeAppend(input);
}
std::cout << "New vector: time elapsed(ms) = " << ts.elapsed() / 1000.0 << std::endl;
std::vector<Type> output = dbv.read<Type>();
EXPECT_EQ(output.back(),output.back());
}
}
Which results in:
Old vector: time elapsed(ms) = 44.004
New vector: time elapsed(ms) = 4380.44
Regarding compiler options and optimizations: Unfortunately I'm stuck at the current settings without the option to change them. In most scenarios the build runs in debug mode, but still has to meet the timing requirements. But anyways, the performance does not improve in release mode:
Old vector: time elapsed(ms) = 20.002
New vector: time elapsed(ms) = 1013.1

I presume that the problem is in the gather data phase and not in the evaluation.
First point is that your OldVector didn't need to make any type conversions, therefore on POD data it would essentially use a memcpy on the data when it inserted.
DynamicAny is a very nice class if you really really need dynamic variable content, but deep within the class we can see (one of) the problem for performance
VarHolder* _pHolder;
which means some memory allocation for each data inserted and some house keeping.
Now an concept implementation as I can't test it, your Trace class
template<class T>
class Trace {
std::vector<T> trace;
public:
template<typename T, class U> std::vector<U> read();
template<typename T, class U> std::vector<T> latest();
template<typename T> void append(std::vector<T> values);
template<typename T> void replace(std::vector<T> values);
void clear();
};
That would work fine if you only used one T. Hide the types in TraceHandler
class TraceHandler {
public:
template<typename T, class U>
std::vector<U> read(std::string const& key);
template<typename T>
void write(std::string const& key, std::vector<T> const& val);
private:
// Store all Traces for different types
std::unordered_map<const std::string, Poco::DynamicAny> values; // abstraction
};
This only works if each key only used one T and DynamicAny can take a vector.
template<class T>
void TraceHandler::write(std::string const& key, std::vector<T> const& val) {
if (values.find(key) == values.end()) { // doesn't exists
Trace<T> temp;
temp.append(val);
values[key] = temp;
} else
values[key].append(val); // only works if DynamicAny can return original type
}
Will it work with your use case?
TraceHandler th;
std::vector<double> vals(1000,1.0), res;
th.write("values",vals);
std::vector<int> resInt;
//res = th.read("values"); // could work if DynamicAny could return original.
td.read("values", res);
//resInt = th.read("values"); // wont work as read can't guess the type
th.read("values", resInt); // read can guess the return type
// handle conversion from stored type to wanted return type
template<class T, class U>
void TraceHandler::read(std::string const& key, std::vector<U>& res) {
// convert from T to U, maybe use Poco???
... work here!!! can be slow as its after it is captured
}
// specialization where T==U ... more work needed.
template<class T, class U>
std::vector<T>& TraceHandler::read(std::string const& key, std::vector<T>& res) {
// no conversion needed
// convince DynamicAny to return the original data
res = values[key]; // will not work out of the box???
}
This should have better performance as there is only one use of Poco::DynamicAny per table per call. Some further optimizations could be made to lessen copying but that can be done later after it runs at all.

You know you are writing only primitive types. You know all these types in advance. Use a plain old union + type tag. Can't beat that. boost::variant should also work.
typedef enum { type_int, type_double } type_tag_t;
struct data_t {
type_tag_t tag;
union {
int int_elem;
double double_elem;
}
};
boost::variant should also work.
Alternatively, store entire std::vectorfuls of data in a
std::map<std::string,
boost::variant<std::vector<int>,
std::vector<double>,
...
>
> mymap;

std::vector<boost::any>
it's a library dedicated to a type that implements type erasure techniques .
boost::any

unordered map without hashing

I'd like to use a structure just like std::map but without ordering, I don't need ordering and my key is pretty huge, so it's "less than" comparision takes time.
So, I saw unordered_map but it has a hash template argument, so, how to use unordered_map without hashing? I'll really need to build my own container?
This question applies to std::set too.
EDIT
Some answers have suggested to create my own hash, but I can't do this, I should have specified it here. The key contains floating point data, so hashing it would be a real bad idea. I need to compare (std::equal_to) directly.

Create your own hash, it's easily done by composing the overloads of std::hash on the fields of your key.
The cppreference example (same as previous link) is quite good (even if you do not need the template stuff):
struct S
{
std::string first_name;
std::string last_name;
};
template <class T>
class MyHash;
template<>
class MyHash<S>
{
public:
std::size_t operator()(S const& s) const
{
std::size_t h1 = std::hash<std::string>()(s.first_name);
std::size_t h2 = std::hash<std::string>()(s.last_name);
return h1 ^ (h2 << 1);
}
};
After that you can use it in the std::unorderd_map:
std::unordered_map<S, Value, MyHash<S>> the_map;
By the way std::unordered_set also need a hash.

You need to spetialize hash object for your key before declaring your unordered_map.
namespace std
{
template <>
class hash<Key>
{
public:
size_t operator()(const Key &) const
{
// ... your hash function for Key object ...
}
};
}
std::unordered_map<Key, Value> myMap;
Example, if I want you use as a key pair:
namespace std
{
class hash<pair<string, int>>
{
public:
size_t operator()(const pair<string, int> &s) const
{
size_t h1 = hash<string>()(s.first);
size_t h2 = hash<int>()(s.second);
return h1 ^ (h2 << 1);
}
};
}
unordered_map<pair<string, int>, string> myMap;

Large POD as tuple for sorting

I have a POD with about 30 members of various types and I will be wanting to store thousands of the PODs in a container, and then sort that container by one of those members.
For example:
struct Person{
int idNumber;
....many other members
}
Thousands of Person objects which I want to sort by idNumber or by any other member I choose to sort by.
I've been researching this for a while today and it seems the most efficient, or at least, simplest, solution to this is not use struct at all, and rather use tuple for which I can pass an index number to a custom comparison functor for use in std::sort. (An example on this page shows one way to implement this type of sort easily, but does so on a single member of a struct which would make templating this not so easy since you must refer to the member by name, rather than by index which the tuple provides.)
My two-part question on this approach is 1) Is it acceptable for a tuple to be fairly large, with dozens of members? and 2) Is there an equally elegant solution for continuing to use struct instead of tuple for this?

You can make a comparator that stores a pointer to member internaly so it knows which member to take for comparison:
struct POD {
int i;
char c;
float f;
long l;
double d;
short s;
};
template<typename C, typename T>
struct Comp {
explicit Comp(T C::* p) : ptr(p) {}
bool operator()(const POD& p1, const POD& p2) const
{
return p1.*ptr < p2.*ptr;
}
private:
T C::* ptr;
};
// helper function to make a comparator easily
template<typename C, typename T>
Comp<C,T> make_comp( T C::* p)
{
return Comp<C,T>(p);
}
int main()
{
std::vector<POD> v;
std::sort(v.begin(), v.end(), make_comp(&POD::i));
std::sort(v.begin(), v.end(), make_comp(&POD::d));
// etc...
}
To further generalize this, make make_comp take a custom comparator, so you can have greater-than and other comparisons.

1) Is it acceptable for a tuple to be fairly large, with dozens of members?
Yes it is acceptable. However it won't be easy to maintain since all you'll have to work with is an index within the tuple, which is very akin to a magic number. The best you could get is reintroduce a name-to-index mapping using an enum which is hardly maintainable either.
2) Is there an equally elegant solution for continuing to use struct instead of tuple for this?
You can easily write a template function to access a specific struct member (to be fair, I didn't put much effort into it, it's more a proof of concept than anything else so that you get an idea how it can be done):
template<typename T, typename R, R T::* M>
R get_member(T& o) {
return o.*M;
}
struct Foo {
int i;
bool j;
float k;
};
int main() {
Foo f = { 3, true, 3.14 };
std::cout << get_member<Foo, float, &Foo::k>(f) << std::endl;
return 0;
}
From there, it's just as easy to write a generic comparator which you can use at your leisure (I'll leave it to you as an exercise). This way you can still refer to your members by name, yet you don't need to write a separate comparator for each member.

You could use a template to extract the sort key:
struct A
{
std::string name;
int a, b;
};
template<class Struct, typename T, T Struct::*Member>
struct compare_member
{
bool operator()(const Struct& lh, const Struct& rh)
{
return lh.*Member < rh.*Member;
}
};
int main()
{
std::vector<A> values;
std::sort(begin(values), end(values), compare_member<A, int, &A::a>());
}
Maybe you want to have a look at boost::multi_index_container which is a very powerful container if you want to index (sort) object by different keys.

Create a class which can use a pointer to a Person member data to use for comparison:
std::sort(container.begin(), container.end(), Compare(&Person::idNumber));
Where Compare is:
template<typename PointerToMemberData>
struct Compare {
Compare(PointerToMemberData pointerToMemberData) :
pointerToMemberData(pointerToMemberData) {
}
template<typename Type
bool operator()(Type lhs, Type rhs) {
return lhs.*pointerToMemberData < rhs.*pointerToMemberData
}
PointerToMemberData pointerToMemberData;
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Access the element of an unordered_set with a hash - c++

Related

using a map with a comparator as a std::map parameter

Safe way to use string_view as key in unordered map

Implementing a Type-erased list for fast write/read

unordered map without hashing

Large POD as tuple for sorting

Categories

Resources