Generic constant time compare function c++ - c++

I'm writing a ProtectedPtr class that protects objects in memory using Windows Crypto API, and I've run into a problem creating a generic constant time compare function. My current code:
template <class T>
bool operator==(volatile const ProtectedPtr& other)
{
std::size_t thisDataSize = sizeof(*protectedData) / sizeof(T);
std::size_t otherDataSize = sizeof(*other) / sizeof(T);
volatile auto thisData = (byte*)getEncyptedData();
volatile auto otherData = (byte*)other.getEncyptedData();
if (thisDataSize != otherDataSize)
return false;
volatile int result = 0;
for (int i = 0; i < thisDataSize; i++)
result |= thisData[i] ^ otherData[i];
return result == 0;
}
getEncryptedData function:
std::unique_ptr<T> protectedData;
const T& getEncyptedData() const
{
ProtectMemory(true);
return *protectedData;
}
The problem is casting to byte*. When using this class with strings, my compiler complains that strings can't be casted to byte pointers. I was thinking maybe trying to base my function off of Go's ConstantTimeByteEq function, but it still brings me back to my original problem of converting a template type to an int or something that I can preform binary manipulation on.
Go's ConstantTimeByteEq function:
func ConstantTimeByteEq(x, y uint8) int {
z := ^(x ^ y)
z &= z >> 4
z &= z >> 2
z &= z >> 1
return int(z)
}
How can I easily convert a template type into something that can have binary manipulation easily preformed on it?
UPDATE Working generic constant compare function based on suggestions from lockcmpxchg8b:
//only works on primative types, and types that don't have
//internal pointers pointing to dynamically allocated data
byte* serialize()
{
const size_t size = sizeof(*protectedData);
byte* out = new byte[size];
ProtectMemory(false);
memcpy(out, &(*protectedData), size);
ProtectMemory(true);
return out;
}
bool operator==(ProtectedPtr& other)
{
if (sizeof(*protectedData) != sizeof(*other))
return false;
volatile auto thisData = serialize();
volatile auto otherData = other.serialize();
volatile int result = 0;
for (int i = 0; i < sizeof(*protectedData); i++)
result |= thisData[i] ^ otherData[i];
//wipe the unencrypted copies of the data
SecureZeroMemory(thisData, sizeof(thisData));
SecureZeroMemory(otherData, sizeof(otherData));
return result == 0;
}

Generally, what you're trying to accomplish in your current code is called Format Preserving Encryption. I.e., to encrypt a std::string such that the resulting ciphertext is also a valid std::string. This is much harder than letting the encryption process convert from the original type to a flat array of bytes.
To do the conversion to a flat array, declare a second template argument for a "Serializer" object, that knows how to serialize objects of type T into an array of unsigned char. You could default it to a generic sizeof/memcpy serializer that would work for all primitve types.
Here's an example for std::string.
template <class T>
class Serializer
{
public:
virtual size_t serializedSize(const T& obj) const = 0;
virtual size_t serialize(const T& obj, unsigned char *out, size_t max) const = 0;
virtual void deserialize(const unsigned char *in, size_t len, T& out) const = 0;
};
class StringSerializer : public Serializer<std::string>
{
public:
size_t serializedSize(const std::string& obj) const {
return obj.length();
};
size_t serialize(const std::string& obj, unsigned char *out, size_t max) const {
if(max >= obj.length()){
memcpy(out, obj.c_str(), obj.length());
return obj.length();
}
throw std::runtime_error("overflow");
}
void deserialize(const unsigned char *in, size_t len, std::string& out) const {
out = std::string((const char *)in, (const char *)(in+len));
}
};
Once you've reduced the objects down to a flat array of unsigned chars, then your given constant-time compare algorithm will work just fine.
Here's a really dumbed-down version of your example code using the serializer above.
template <class T, class S>
class Test
{
std::unique_ptr<unsigned char[]> protectedData;
size_t serSize;
public:
Test(const T& obj) : protectedData() {
S serializer;
size_t size = serializer.serializedSize(obj);
protectedData.reset(new unsigned char[size]);
serSize = serializer.serialize(obj, protectedData.get(), size);
// "Encrypt"
for(size_t i=0; i< size; i++)
protectedData.get()[i] ^= 0xa5;
}
size_t getEncryptedLen() const {
return serSize;
}
const unsigned char *getEncryptedData() const {
return protectedData.get();
}
const T getPlaintextData() const {
S serializer;
T target;
//"Decrypt"
for(size_t i=0; i< serSize; i++)
protectedData.get()[i] ^= 0xa5;
serializer.deserialize(protectedData.get(), serSize, target);
return target;
}
};
int main(int argc, char *argv[])
{
std::string data = "test";
Test<std::string, StringSerializer> tester(data);
const unsigned char *ptr = tester.getEncryptedData();
std::cout << "\"Encrypted\" bytes: ";
for(size_t i=0; i<tester.getEncryptedLen(); i++)
std::cout << std::setw(2) << std::hex << std::setfill('0') << (unsigned int)ptr[i] << " ";
std::cout << std::endl;
std::string recov = tester.getPlaintextData();
std::cout << "Recovered: " << recov << std::endl;
}
Output:
$ ./a.out
"Encrypted" bytes: d1 c0 d6 d1
Recovered: test
Edit: answering request for a generic serializer for primtive/flat types. Consider this as pseudocode, because I'm typing it into a browser without testing. I'm not sure if that's the right template syntax.
template<class T>
class PrimitiveSerializer : public Serializer<T>
{
public:
size_t serializedSize(const T& obj) const {
return sizeof obj;
};
size_t serialize(const T& obj, unsigned char *out, size_t max) const {
if(max >= sizeof obj){
memcpy(out, &obj, sizeof obj);
return sizeof obj;
}
throw std::runtime_error("overflow");
}
void deserialize(const unsigned char *in, size_t len, T& out) const {
if(len < sizeof out) {
throw std::runtime_error("underflow");
}
memcpy(&out, in, sizeof out);
}
};

I'm curious about what error the compiler gives you.
That said, try casting to a const char* or const void*.
Another issue could be the casting from a 64-bit pointer to an 8-bit byte. Try casting to an int, long, or longlong
Edit: Based upon your feedback, another minor change:
volatile auto thisData = (byte*)&getEncyptedData();
volatile auto otherData = (byte*)&other.getEncyptedData();
(note the ampersands). That will allow the previous casts to work

Related

How do I get the exact length of an array without using subscripted values within a template function in C++?

I'm trying to find the length of any given array and the methods I've tried so far won't work. This is my sizeOf function (not to be confused with sizeof):
template <typename T>
size_t sizeOf(T val) {
size_t sz = (size_t)0;
if (typeid(val).name()[0] == 'c') {
sz = (int)sizeof(val);
} else if (typeid(val).name()[1] == 'c' || typeid(val).name()[2] == 'c'){
sz = strlen((const char*)val);
} else if (typeid(val).name()[0] == 'i') {
sz = sizeof(val) / 4;
} else {
sz = 0; //new code here.
}
return sz;
}
So far, it works for strings, integers, and characters. The problem is finding the length of any array that isn't of type char. I tried using a for loop and subscripted values (e.g. arr[i]), but this exception occurs when the argument isn't an array: subscripted value is neither an array nor pointer. To this I try a try {...} catch () {...} statement and it still gives this error, so I can't use indices. I also tried another method with pointers: *(&arr + 1) - arr which works without errors, but the values are inconsistent or completely imprecise. Is there a way to acquire the true length of an array without the aforementioned methods and still have a flexible function?
Here's a solution that might work for you. It uses if constexpr to ensure that the first template only tries to compile valid branches in the code (you can add more if you need them) and a template overload to handle arrays of arbitrary type. Note: needs C++17 or later.
#include <iostream>
#include <cstring>
#include <type_traits>
template <typename T>
size_t sizeOf (T val)
{
size_t sz = 0;
if constexpr (std::is_integral_v <T>)
sz = sizeof (val);
else if constexpr (std::is_same_v <T, const char *>)
sz = strlen (val);
return sz;
}
template <typename T, size_t size>
size_t sizeOf (T (&)[size])
{
return size;
}
int main ()
{
int i = 0;
char c = 0;
std::cout << "sizeOf int = " << sizeOf (i) << "\n";
std::cout << "sizeOf char = " << sizeOf (c) << "\n";
const char *pc = "abcde";
std::cout << "sizeOf abcde = " << sizeOf (pc) << "\n";
int a [6];
std::cout << "sizeOf int [6] = " << sizeOf (a) << "\n";
}
Output:
sizeOf int = 4
sizeOf char = 1
sizeOf abcde = 5
sizeOf int [6] = 6
template <typename T>
size_t sizeOf(T const&) {
return 1;
}
template <typename T,nstd::size_t N>
size_t sizeOf(T const(&)[N]) {
return N;
}
template <typename T,nstd::size_t N>
size_t sizeOf(std::array<T,N> const&) {
return N;
}
template <typename T>
size_t sizeOf(T const* val) {
std::size_t N=0;
while (val&&val[0]){
++N;++val;
}
if(val)++N;// include null terminator, makes it comoatible with `"array"` length
return N
}

Use of std::vector results in unknown output C++

I'm unable to figure out why the output I received isn't just "00110" but has other giberrish characters in it. Not sure what's wrong with my vector push_back.. It definitely makes sense to me. If I changed it to std::string implementation, it would give a correct output. But in this case, I would need to use vector for proper encapsulation of the object's state. I've been debugging for a few hours now, but still can't find out why. Hope anyone is able to help! Thanks! Note: main() can't be modified.
#include <iostream>
#include <vector>
template<size_t NumBits>
class bitsetts
{
private:
static const unsigned int NO_OF_BITS = CHAR_BIT * sizeof(int); //32 bits
static const unsigned NumBytes = (NumBits - 7) /8;
unsigned char array[NumBytes];
public:
bitsetts() { }
void set(size_t bit, bool val = true) {
if (val == true)
{
array[bit] |= (val << bit );
}
else
{
array[bit] &= (val << bit );
}
}
bool test(size_t bit) const {
return array[bit] & (1U << bit );
}
const std::string to_string()
{
std::vector<char> str;
for (unsigned int i=NumBits; i-- > 0;)
str.push_back('0' + test(i));
return str.data();
}
friend std::ostream& operator<<(std::ostream& os, const bitsetts& ob)
{
for (unsigned i = NumBits; i-- > 0;)
os << ob.test(i);
return os << '\n';
}
};
int main()
{
try
{
bitsetts<5> bitsetts;
bitsetts.set(1);
bitsetts.set(2);
const std::string st = bitsetts.to_string();
if (st != "00110")
{
std::cout << st << std::endl;
throw std::runtime_error{ "-" };
}
}
catch (const std::exception& exception)
{
std::cout << "Conversion failed\n";
}
}
You are filling the std::vector with char values and then constructing a std::string from the raw char data using the std::string constructor that takes a single const char* parameter. That constructor expects the char data to be null-terminated, but you are not pushing a null terminator into your vector, which is why you get extra garbage on the end of your std::string.
So, either push a null terminator into the vector, eg:
const std::string to_string()
{
std::vector<char> str;
for (unsigned int i=NumBits; i-- > 0;)
str.push_back('0' + test(i));
str.push_back('\0'); // <-- add this!
return str.data();
}
Or, use a different std::string constructor that can take the vector's size() as a parameter, eg:
const std::string to_string()
{
std::vector<char> str;
for (unsigned int i=NumBits; i-- > 0;)
str.push_back('0' + test(i));
return std::string(str.data(), str.size()); // <-- add size()!
}
On a side note: your to_string() method should be marked as const, eg:
const std::string to_string() const
Which would then allow you to use to_string() inside of your operator<<, eg:
friend std::ostream& operator<<(std::ostream& os, const bitsetts& b)
{
return os << b.to_string() << '\n';
}

Different return and coordinate types in nanoflann radius search

I'm trying to use nanoflann in a project and am looking at the vector-of-vector and radius search examples.
I can't find a way to perform a radius search with a different data type than the coordinate type. For example, my coordinates are vectors of uint8_t; I am trying to input a radius of type uint32_t with little success.
I see in the source that the metric_L2 struct (which I am using for distance) uses the L2_Adaptor with two template parameters. L2_Adaptor itself takes three parameters, with the third defaulted to the first, which seems to be the problem if I am understanding the code correctly. However, trying to force use of the third always results in 0 matches in the radius search.
Is there a way to do this?
Edit: In the same code below, everything works. However, if I change the search_radius (and ret_matches) to uint32_t, the radiusSearch method doesn't work.
#include <iostream>
#include <Eigen/Dense>
#include <nanoflann.hpp>
typedef Eigen::Matrix<uint8_t, Eigen::Dynamic, 1> coord_t;
using namespace nanoflann;
struct Point
{
coord_t address;
Point() {}
Point(uint8_t coordinates) : address(coord_t::Random(coordinates)) {}
};
struct Container
{
std::vector<Point> points;
Container(uint8_t coordinates, uint32_t l)
: points(l)
{
for(auto& each_location: points)
{
each_location = Point(coordinates);
}
}
};
struct ContainerAdaptor
{
typedef ContainerAdaptor self_t;
typedef nanoflann::metric_L2::traits<uint8_t, self_t>::distance_t metric_t;
typedef KDTreeSingleIndexAdaptor<metric_t, self_t, -1, size_t> index_t;
index_t *index;
const Container &container;
ContainerAdaptor(const int dimensions, const Container &container, const int leaf_max_size = 10)
: container(container)
{
assert(container.points.size() != 0 && container.points[0].address.rows() != 0);
const size_t dims = container.points[0].address.rows();
index = new index_t(dims, *this, nanoflann::KDTreeSingleIndexAdaptorParams(leaf_max_size));
index->buildIndex();
}
~ContainerAdaptor()
{
delete index;
}
inline void query(const uint8_t *query_point, const size_t num_closest, size_t *out_indices, uint32_t *out_distances_sq, const int ignoreThis = 10) const
{
nanoflann::KNNResultSet<uint32_t, size_t, size_t> resultSet(num_closest);
resultSet.init(out_indices, out_distances_sq);
index->findNeighbors(resultSet, query_point, nanoflann::SearchParams());
}
const self_t& derived() const
{
return *this;
}
self_t& derived()
{
return *this;
}
inline size_t kdtree_get_point_count() const
{
return container.points.size();
}
inline size_t kdtree_distance(const uint8_t *p1, const size_t idx_p2, size_t size) const
{
size_t s = 0;
for (size_t i = 0; i < size; i++)
{
const uint8_t d = p1[i] - container.points[idx_p2].address[i];
s += d * d;
}
return s;
}
inline coord_t::Scalar kdtree_get_pt(const size_t idx, int dim) const
{
return container.points[idx].address[dim];
}
template <class BBOX>
bool kdtree_get_bbox(BBOX & bb) const
{
for(size_t i = 0; i < bb.size(); i++)
{
bb[i].low = 0;
bb[i].high = UINT8_MAX;
}
return true;
}
};
void container_demo(const size_t points, const size_t coordinates)
{
Container s(coordinates, points);
coord_t query_pt(coord_t::Random(coordinates));
typedef ContainerAdaptor my_kd_tree_t;
my_kd_tree_t mat_index(coordinates, s, 25);
mat_index.index->buildIndex();
const uint8_t search_radius = static_cast<uint8_t>(100);
std::vector<std::pair<size_t, uint8_t>> ret_matches;
nanoflann::SearchParams params;
const size_t nMatches = mat_index.index->radiusSearch(query_pt.data(), search_radius, ret_matches, params);
for (size_t i = 0; i < nMatches; i++)
{
std::cout << "idx[" << i << "]=" << +ret_matches[i].first << " dist[" << i << "]=" << +ret_matches[i].second << std::endl;
}
std::cout << std::endl;
std::cout << "radiusSearch(): radius=" << +search_radius << " -> " << +nMatches << " matches" << std::endl;
}
int main()
{
container_demo(1e6, 32);
return 0;
}
More info: so it seems that the distance type, which the third parameter of the L2_Adaptor, must be a signed type. Changing the metric_t typedef to the following solves the problem if search_radius and ret_matches are also changed to int64_t.
typedef L2_Adaptor<uint8_t, self_t, int64_t> metric_t;

Understanding on User Defined function

Create a UserArray of bit fields which can be declared as follows: The size occupied by our Array will be less then a normal array. Suppose we want an ARRAY of 20 FLAGs (TRUE/FALSE). A bool FLAG[20] will take 20 bytes of memory, while UserArray<bool,bool,0,20> will take 4 bytes of memory.
Use class Template to create user array.
Use Bit wise operators to pack the array.
Equality operation should also be implemented.
template<class T,int W,int L,int H>//i have used template<class T>
//but never used such way
class UserArray{
//....
};
typedef UserArray<bool,4,0,20> MyType;
where:
T = type of an array element
W = width of an array element, 0 < W < 8
L = low bound of array index (preferably zero)
H = high bound of array index
A main program:
int main() {
MyType Display; //typedef UserArray<T,W,L,H> MyType; defined above
Display[0] = FALSE; //need to understand that how can we write this?
Display[1] = TRUE; //need to understand that how can we write this?
//assert(Display[0]);//commented once, need to understand above code first
//assert(Display[1]);//commented once..
//cout << “Size of the Display” << sizeof(Display);//commented once..
}
My doubt is how those parameters i.e T,L,W & H are used in class UserArray and how can we write instance of UserArray as Display[0] & Display[1] what does it represent?
Short & simple example of similar type will be easy for me to understand.
W, L and H are non-type template parameters. You can instantiate a template (at compile-time) with constant values, e.g.:
template <int N>
class MyArray
{
public:
float data[N];
void print() { std::cout << "MyArray of size " << N << std::endl; }
};
MyArray<7> foo;
MyArray<8> bar;
foo.print(); // "MyArray of size 7"
bar.print(); // "MyArray of size 8"
In the example above, everywhere that N appears in the template definition, it will be replaced at compile-time by the supplied constant.
Note that MyArray<7> and MyArray<8> are completely different types as far as the compile is concerned.
I have no idea what the solution to your specific problem is. But your code won't compile, currently, as you have not provided values for the template parameters.
This is not simple, particularly as you can have variable bit widths.
<limits.h> has a constant CHAR_BIT, which is the number of bits in a byte. Usually this is 8, but it could be greater than 8 (not less though).
I suggest the number of elements per byte be CHAR_BIT / W. This might waste a few bits for example, if width is 3 and CHAR_BIT is 8, but this is complicated enough as is.
You'll then need to define operator[] to access the elements, and likely need to do some bit fiddling to do this. For the non-const version of operator[], you'll probably have to return some sort of proxy object when there are more than one elements in a byte, and have its operator= overridden so it writes back to the appropriate spot in the array.
It's a good exercise though to figure this one out though.
Here's some code that implements what you ask for, except the lower bound is fixed at 0. It also shows a rare use case for the address_of operator. You could take this further and make this container compatible with STL algorithms if you liked.
#include <iostream>
#include <limits.h>
#include <stddef.h>
template<class T, size_t WIDTH, size_t SIZE>
class UserArray;
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayProxy;
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayAddressProxy
{
public:
typedef UserArray<T, WIDTH, SIZE> array_type;
typedef UserArrayProxy<T, WIDTH, SIZE> proxy_type;
typedef UserArrayAddressProxy<T, WIDTH, SIZE> this_type;
UserArrayAddressProxy(array_type& a_, size_t i_) : a(a_), i(i_) {}
UserArrayAddressProxy(const this_type& x) : a(x.a), i(x.i) {}
proxy_type operator*() { return proxy_type(a, i); }
this_type& operator+=(size_t n) { i += n; return *this; }
this_type& operator-=(size_t n) { i -= n; return *this; }
this_type& operator++() { ++i; return *this; }
this_type& operator--() { --i; return *this; }
this_type operator++(int) { this_type x = *this; ++i; return x; }
this_type operator--(int) { this_type x = *this; --i; return x; }
this_type operator+(size_t n) const { this_type x = *this; x += n; return x; }
this_type operator-(size_t n) const { this_type x = *this; x -= n; return x; }
bool operator==(const this_type& x) { return (&a == &x.a) && (i == x.i); }
bool operator!=(const this_type& x) { return !(*this == x); }
private:
array_type& a;
size_t i;
};
template<class T, size_t WIDTH, size_t SIZE>
class UserArrayProxy
{
public:
static const size_t BITS_IN_T = sizeof(T) * CHAR_BIT;
static const size_t ELEMENTS_PER_T = BITS_IN_T / WIDTH;
static const size_t NUMBER_OF_TS = (SIZE - 1) / ELEMENTS_PER_T + 1;
static const T MASK = (1 << WIDTH) - 1;
typedef UserArray<T, WIDTH, SIZE> array_type;
typedef UserArrayProxy<T, WIDTH, SIZE> this_type;
typedef UserArrayAddressProxy<T, WIDTH, SIZE> address_proxy_type;
UserArrayProxy(array_type& a_, int i_) : a(a_), i(i_) {}
this_type& operator=(T x)
{
a.write(i, x);
return *this;
}
address_proxy_type operator&() { return address_proxy_type(a, i); }
operator T()
{
return a.get(i);
}
private:
array_type& a;
size_t i;
};
template<class T, size_t WIDTH, size_t SIZE>
class UserArray
{
public:
typedef UserArrayAddressProxy<T, WIDTH, SIZE> ptr_t;
static const size_t BITS_IN_T = sizeof(T) * CHAR_BIT;
static const size_t ELEMENTS_PER_T = BITS_IN_T / WIDTH;
static const size_t NUMBER_OF_TS = (SIZE - 1) / ELEMENTS_PER_T + 1;
static const T MASK = (1 << WIDTH) - 1;
T operator[](size_t i) const
{
return get(i);
}
UserArrayProxy<T, WIDTH, SIZE> operator[](size_t i)
{
return UserArrayProxy<T, WIDTH, SIZE>(*this, i);
}
friend class UserArrayProxy<T, WIDTH, SIZE>;
private:
void write(size_t i, T x)
{
T& element = data[i / ELEMENTS_PER_T];
int offset = (i % ELEMENTS_PER_T) * WIDTH;
x &= MASK;
element &= ~(MASK << offset);
element |= x << offset;
}
T get(size_t i)
{
return (data[i / ELEMENTS_PER_T] >> ((i % ELEMENTS_PER_T) * WIDTH)) & MASK;
}
T data[NUMBER_OF_TS];
};
int main()
{
typedef UserArray<int, 6, 20> myarray_t;
myarray_t a;
std::cout << "Sizeof a in bytes: " << sizeof(a) << std::endl;
for (size_t i = 0; i != 20; ++i) { a[i] = i; }
for (size_t i = 0; i != 20; ++i) { std::cout << a[i] << std::endl; }
std::cout << "We can even use address_of operator: " << std::endl;
for (myarray_t::ptr_t e = &a[0]; e != &a[20]; ++e) { std::cout << *e << std::endl; }
}

how to use boost::unordered_map

for my application, i need to use a hash map, so i have written a test program in which i store some instances of a baseclass in a boost::unordered_map. but i want to reach the instances by calling special functions which return a derived class of the base and i use those functions' parameters for hash key of unordered_map. if no class is found with certain parameters then a class is generated and stored in map. the purpose of the program may not be clear but here is the code.
#include <boost/unordered_map.hpp>
#include <iostream>
using namespace std;
using namespace boost;
typedef unsigned char BYT;
typedef unsigned long long ULL;
class BaseClass
{
public:
int sign;
size_t HASHCODE;
BaseClass(){}
};
class ClassA : public BaseClass
{
public:
int AParam1;
int AParam2;
ClassA(int s1, int s2) : AParam1(s1), AParam2(s2)
{
sign = AParam1;
}
};
struct HashKey
{
ULL * hasharray;
size_t hashNum;
size_t HASHCODE;
HashKey(ULL * ULLarray, size_t Hashnum) : hasharray(ULLarray), hashNum(Hashnum), HASHCODE(0)
{ }
bool operator == (const HashKey & hk ) const
{
bool deg = (hashNum == hk.hashNum);
if (deg)
{
for (int i = 0; i< hashNum;i++)
if(hasharray[i] != hk.hasharray[i]) return false;
}
return deg;
}
};
struct ihash : std::unary_function<HashKey, std::size_t>
{
std::size_t operator()(HashKey const & x) const
{
std::size_t seed = 0;
if (x.hashNum == 1)
seed = x.hasharray[0];
else
{
int amount = x.hashNum * 8;
const std::size_t fnv_prime = 16777619u;
BYT * byt = (BYT*)x.hasharray;
for (int i = 0; i< amount;i++)
{
seed ^= byt[0];
seed *= fnv_prime;
}
}
return seed;
}
};
typedef std::pair<HashKey,BaseClass*> HashPair;
unordered_map<HashKey,BaseClass*,ihash> UMAP;
typedef unordered_map<HashKey,BaseClass*,ihash>::iterator iter;
BaseClass * & FindClass(ULL* byt, int Num, size_t & HCode)
{
HashKey hk(byt,Num);
HashPair hp(hk,0);
std::pair<iter,bool> xx = UMAP.insert(hp);
// if (xx.second) UMAP.rehash((UMAP.size() + 1) / UMAP.max_load_factor() + 1);
if (!xx.first->second) HCode = UMAP.hash_function()(hk);
return xx.first->second;
}
template <typename T, class A,class B>
T* GetClass(size_t& hashcode ,A a, B b)
{
ULL byt[3] = {a,b,hashcode};
BaseClass *& cls = FindClass(byt, 3, hashcode);
if(! cls){ cls = new T(a,b); cls->HASHCODE = hashcode;}
return static_cast<T*>(cls);
}
ClassA * findA(int Period1, int Period2)
{
size_t classID = 100;
return GetClass<ClassA>(classID,Period1,Period2);
}
int main(int argc, char* argv[])
{
int limit = 1000;
int modnum = 40;
int result = 0;
for(int i = 0 ; i < limit; i++ )
{
result += findA( rand() % modnum ,4)->sign ;
}
cout << UMAP.size() << "," << UMAP.bucket_count() << "," << result << endl;
int x = 0;
for(iter it = UMAP.begin(); it != UMAP.end(); it++)
{
cout << ++x << "," << it->second->HASHCODE << "," << it->second->sign << endl ;
delete it->second;
}
return 0;
}
the problem is, i expect that the size of UMAP is equal to modnum however it is allways greater than modnum which means there are more than one instance that has the same parameters and HASHCODE.
what is the solution to my problem? please help.
thanks
Here are a couple of design problems:
struct HashKey
{
ULL * hasharray;
...
Your key type stores a pointer to some array. But this pointer is initialized with the address of a local object:
BaseClass * & FindClass(ULL* byt, int Num, size_t & HCode)
{
HashKey hk(byt,Num); // <-- !!!
HashPair hp(hk,0);
std::pair<iter,bool> xx = UMAP.insert(hp);
if (!xx.first->second) HCode = UMAP.hash_function()(hk);
return xx.first->second;
}
template <typename T, class A,class B>
T* GetClass(size_t& hashcode ,A a, B b)
{
ULL byt[3] = {a,b,hashcode}; // <-- !!!
BaseClass *& cls = FindClass(byt, 3, hashcode);
if(! cls){ cls = new T(a,b); cls->HASHCODE = hashcode;}
return static_cast<T*>(cls);
}
This makes the map store a HashKey object with a dangling pointer. Also you are returning a reference to a member of a function local object called xx in FindClass. The use of this reference invokes undefined behaviour.
Consider renaming the map's key type. The hash code itself shouldn't be a key. And as your operator== for HashKey suggests, you don't want the actual key to be the hash code but the sequence of integers of variable length. Also, consider storing the sequence inside of the key type instead of a pointer, for example, as a vector. In addition, avoid returning references to function local objects.
Using unordered_map does not guarantee that you do not get has collisions, which is what you describe here.
there are more than one instance that
has the same parameters and HASHCODE
You can tune your hashing algorithm to minimize this, but in the (inevitable) collision case, the hash container extends the list of objects in the bucket corresponding to that hashcode. Equality comparison is then used to resolve the collision to a specific matching object. This may be where your problem lies - perhaps your operator== does not properly disambiguate similar but not identical objects.
You cannot expect one object per bucket, or the container would grow unbounded in large collection size cases.
btw if you are using a newer compiler you may find it supports std::unordered_map, so you can use that (the official STL version) instead of the Boost version.