C++ MurmurHash3 : how to hash integer - c++

I am confused with how should i call MurmurHash3_x86_128() with integer key value or is it even possible ? The murmurhash3 code can be found https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp. Method definition is given below.
void MurmurHash3_x86_128 ( const void * key, const int len,
uint32_t seed, void * out )
I am hashing integer value with len as 1 . Is it correct or wrong ?
int main()
{
uint64_t seed = 100;
int p = 500; // key to hash
uint64_t hash_otpt[2]= {0};
const int *key = &p;
MurmurHash3_x64_128(key, 1, seed, hash_otpt); // 0xb6d99cf8
cout << *hash_otpt << endl;
}

You are passing key, which is a pointer to (const) int, so you should be passing sizeof(int) as the length.
Passing 1 would only work in case int is 1 byte wide on your platform, which is rarely the case.

Related

Pass and receive raw byte data via an unsigned char* function parameters in C++

I have a 3rd party x.dll and an export function which is named as "GenerateKeyEx-in C". I do not have any additional info such as .lib, header etc. I have found the function parameters from x.dll supplier and they are already added in the code.
Here you can see the parameter definitions;
> [in] ipSeedArray: the seed queried by the ECU (as byte raw data)
> [in] iSeedArraySize: The size of the array
> [in] iSecurityLevel: the security level to be change to
> [in] ipVariant: the ECU variant’s qualifier
> [out] iopKeyArray: the calculated key on return (as byte raw data)
> [in] iMaxKeyArraySize: maximum number of key bytes available
> [out] oActualKeyArraySize: the number of key bytes calculated
I can access and run the function "GenerateKeyEx" without any error. Functions returns always 4 which is unspecified error for the function. I think I can not pass the array values(or initialize the arrays correctly) between main and dll function. const unsigned char* ipSeedArray and unsigned char* iopKeyArray are specified as raw data bytes(above) and did I define the arrays wrong to pass raw byte datas via unsigned char* ?
//__stdcall replaced //__cdecl *f_GenerateKey
typedef int(*f_GenerateKey)(const unsigned char* ipSeedArray,unsigned int iSeedArraySize,const unsigned int iSecurityLevel,
const char* ipVariant,unsigned char* iopKeyArray,unsigned int iMaxKeyArraySize,unsigned int& oActualKeyArraySize);
int main()
{
HINSTANCE hGetProcIDDLL = LoadLibrary(L"C:\\Users\\thego\\source\\repos\\ConsoleApplication1cp\\ConsoleApplication1cp\\Debug\\SeednKey.dll");
if (!hGetProcIDDLL) {
std::cout << "could not load the dynamic library" << std::endl;
return EXIT_FAILURE;
}
// resolve function address here
f_GenerateKey GenerateKey = (f_GenerateKey)GetProcAddress(hGetProcIDDLL, "GenerateKeyEx");
if (!GenerateKey) {
std::cout << "could not locate the function" << std::endl;
return EXIT_FAILURE;
}
const int sb = 4; const int kb = 100;
const BYTE seedbuffer[sb] = { 0x0B,0xCF,0xFE,0x10 }; //function in
unsigned int seedbufferSize = sizeof(seedbuffer) / sizeof(seedbuffer[0]); //function in
const unsigned int SecurityLevel = 0x01; //function in
const char Variant[1] = { '0' }; //function in
BYTE keybuffer[kb]; //function out
for (int i = 0; i < kb; ++i) keybuffer[i] = 0x00;
unsigned int MaxKeykeybuffer = sizeof(keybuffer) / sizeof(keybuffer[0]); //function in
unsigned int oSize; //function out
int x = GenerateKey(seedbuffer, seedbufferSize, SecurityLevel, Variant, keybuffer, MaxKeykeybuffer, oSize);
return 0;
}
Here:
GenerateKey(seedbuffer, SeedArraySize, ...
you are using seedbuffer, while (I'm guessing) you need SeedArray.
The bottom line is: you are passing an array of 4 bytes, and tell them that there are 9.

How do I extract little-endian unsigned short from long pointer?

I have a long pointer value that points to a 20 byte header structure followed by a larger array. Dec(57987104)=Hex(0374D020). All the values are stored little endian. 1400 when swapped is 0014 which in decimal is 20.
The question here is how do I get the first value which is a 2 byte unsigned short. I have a C++ dll to convert this for me. I'm running Windows 10.
GetCellData_API unsigned short __stdcall getUnsignedShort(unsigned long ptr)
{
unsigned long *p = &ptr;
unsigned short ret = *p;
return ret;
}
But when I call this from VBA using Debug.Print getUnsignedShort(57987104) I get 30008 when it should be 20.
I might need to do an endian swap but I'm not sure how to incorporate this from CodeGuru: How do I convert between big-endian and little-endian values?
inline void endian_swap(unsigned short& x)
{
x = (x >> 8) |
(x << 8);
}
How do I extract little endian unsigned short from long pointer?
I think I'd be inclined to write your interface function in terms of a general template function that describes the operation:
#include <utility>
#include <cstdint>
// Code for the general case
// you'll be amazed at the compiler's optimiser
template<class Integral>
auto extract_be(const std::uint8_t* buffer)
{
using accumulator_type = std::make_unsigned_t<Integral>;
auto acc = accumulator_type(0);
auto count = sizeof(Integral);
while(count--)
{
acc |= accumulator_type(*buffer++) << (8 * count);
}
return Integral(acc);
}
GetCellData_API unsigned short __stdcall getUnsignedShort(std::uintptr_t ptr)
{
return extract_be<std::uint16_t>(reinterpret_cast<const std::uint8_t*>(ptr));
}
As you can see from the demo on godbolt, the compiler does all the hard work for you.
Note that since we know the size of the data, I have used the sized integer types exported from <cstdint> in case this code needs to be ported to another platform.
EDIT:
Just realised that your data is actually LITTLE ENDIAN :)
template<class Integral>
auto extract_le(const std::uint8_t* buffer)
{
using accumulator_type = std::make_unsigned_t<Integral>;
auto acc = accumulator_type(0);
constexpr auto size = sizeof(Integral);
for(std::size_t count = 0 ; count < size ; ++count)
{
acc |= accumulator_type(*buffer++) << (8 * count);
}
return Integral(acc);
}
GetCellData_API unsigned short __stdcall getUnsignedShort(std::uintptr_t ptr)
{
return extract_le<std::uint16_t>(reinterpret_cast<const std::uint8_t*>(ptr));
}
Lets say youre pointing with pulong pulong[6] you are pointing 6 sixth member of the table
unsigned short psh*;
unsigned char puchar*
unsigend char ptable[4];
ZeroMemory(ptable,4);
puchar[3]=((char *)( &pulong[6]))[0];
puchar[2]=((char *)( &pulong[6]))[1];
puchar[1]=((char *)( &pulong[6]))[2];
puchar[0]=((char *)( &pulong[6]))[3];
psh=(unsigned short *) puchar;
//first one
psh[0];
//second one
psh[1];
THis was what was in my mind while mistaking me

C++ What should we pass in MurmurHash3 parameters?

I am confused with what parameter should I provide for the MurmurHash3_x86_128(). The murmurhash3 code can be found https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp. Method definition is given below.
void MurmurHash3_x86_128 ( const void * key, const int len,
uint32_t seed, void * out )
I have passed the following values in the above method but my compiler is giving me segmentation fault. What am i doing wrong ?
int main()
{
uint64_t seed = 1;
uint64_t *hash_otpt;
const char *key = "hi";
MurmurHash3_x64_128(key, (uint64_t)strlen(key), seed, hash_otpt);
cout << "hashed" << hash_otpt << endl;
return 0;
}
This function put its hash in 128 bits of memory.
What your are doing is passing a pointer, that is not allocated yet to it.
The correct usage would be something like that:
int main()
{
uint64_t seed = 1;
uint64_t hash_otpt[2]; // allocate 128 bits
const char *key = "hi";
MurmurHash3_x64_128(key, (uint64_t)strlen(key), seed, hash_otpt);
cout << "hashed" << hash_otpt[0] << hash_otpt[1] << endl;
return 0;
}
You could have noticed that by analyzing how MurmurHash3_x86_128 fills out parameter:
((uint64_t*)out)[0] = h1;
((uint64_t*)out)[1] = h2;
hash_otpt is a pointer to nothing, but the function expects the fourth argument to be a pointer to some memory as it writes its output into this memory. In your example, it attempts a write operation, but fails (there's nowhere to write to as the pointer is not initialized). This gives you a SegmentationFault.
Figure out in how many uint64_ts does the hash fit into (2, because the output's size is 128 bits, and the size of a uint64_t is 64 bits) and allocate the memory:
hash_otpt = new uint64_t [2];
If you look at the documentation, you can see
MurmurHash3_x64_128 ... It has a 128-bit output.
So, your code can be something like this
uint64_t hash_otpt[2]; // This is 128 bits
MurmurHash3_x64_128(key, (uint64_t)strlen(key), seed, hash_otpt);
Note that you don't have to dynamically allocate the output at all.

How to cast from char pointer to custom object pointer

I'm using leveldb to store key-value pairs of integer and MyClass objects. Actually, a key can contain more then one of theses objects.
The problem I have appears when retrieving the data from the database. It compiles, however the values of the MyClass members are not the one I put into the database.
std::string value;
leveldb::Slice keySlice = ANYKEY;
levelDBObj->Get(leveldb::ReadOptions(), keySlice, &value);
The std::string value1 can now contain only one MyClass object or more. So how do I get them?
I already tried the following which didn't work;
1.) directly typecasting and memcpy
std::vector<MyClass> vObjects;
MyClass* obj = (MyClass*)malloc( value.size());
memcpy((void*)obj, (void*) (value.c_str()), value.size());
MyClass dummyObj;
int numValues = value.size()/sizeof(MyClass);
for( int i=0; i<numValues; ++i) {
dummyObj = *(obj+i);
vObjects.push_back(dummyObj);
}
2.) reinterpret_cast to void pointer
MyClass* obj = (MyClass*)malloc( value.size());
const void* vobj = reinterpret_cast<const void*>( value.c_str() );
int numValues = value.size()/sizeof(MyClass);
for( int i=0; i<numValues; ++i) {
const MyClass dummyObj = *(reinterpret_cast<const MyClass*>(vobj)+i);
vObjects.push_back(dummyObj);
}
MyClass is a collection of several public members, e.g. unsigned int and unsigned char and it has a stable size.
I know that there are similar problems with only one object. But in my case the vector can contain more then one and it comes from the leveldb database.
EDIT: SOLUTION
I wrote (de)serialization method for MyClass which then made it working. Thanks for the hint!
void MyClass::serialize( char* outBuff ) {
memcpy(outBuff, (const void*) &aVar, sizeof(aVar));
unsigned int c = sizeof(aVar);
memcpy(outBuff+c, (const void*) &bVar, sizeof(bVar));
c += sizeof(bVAr);
/* and so on */
}
void MyClass::deserialize( const char* inBuff ) {
memcpy((void*) &aVar, inBuff, sizeof(aVar));
unsigned int c = sizeof(aVar);
memcpy((void*) &aVar, inBuff+c, sizeof(aVar));
c += sizeof(aVar);
/* and so on */
}
The get method is as follows (put analogously):
int getValues(leveldb::Slice keySlice, std::vector<MyObj>& values) const {
std::string value;
leveldb::Status status = levelDBObj->Get(leveldb::ReadOptions(), keySlice, &value);
if (!status.ok()) {
values.clear();
return -1;
}
int nValues = value1.size()/sizeof(CHit);
MyObj dummyObj;
for( int i=0; i<nValues; ++i) {
dummyObj.deserialize(value.c_str()+i*sizeof(MyObj));
values.push_back(dummyObj);
}
return 0;
}
You have to serialize your class... otherwise, you're just taking some memory and writing it in leveldb. Whatever you get back is not only going to be different, but it will probably be completely useless too. Check out this question for more info on serialization: How do you serialize an object in C++?
LevelDB does support multiple objects under one key, however, try to avoid doing that unless you have a really good reason. I would recommend that you hash each object with a unique hash (see Google's CityHash if you want a hashing function) and store the serialized objects with their corresponding hash. If your objects is a collection in itself, then you have to serialize all of your objects to an array of bytes and have some method that allows you to determine where each object begins/ends.
Update
A serializable class would look something like this:
class MyClass
{
private:
int _numeric;
string _text;
public:
// constructors
// mutators
void SetNumeric(int num);
void SetText(string text);
static unsigned int SerializableSize()
{
// returns the serializable size of the class with the schema:
// 4 bytes for the numeric (integer)
// 4 bytes for the unsigned int (the size of the text)
// n bytes for the text (it has a variable size)
return sizeof(int) + sizeof(unsigned int) + _text.size();
}
// serialization
int Serialize(const char* buffer, const unsigned int bufferLen, const unsigned int position)
{
// check if the object can be serialized in the available buffer space
if(position+SerializableSize()>bufferLen)
{
// don't write anything and return -1 signaling that there was an error
return -1;
}
unsigned int finalPosition = position;
// write the numeric value
*(int*)(buffer + finalPosition) = _numeric;
// move the final position past the numeric value
finalPosition += sizeof(int);
// write the size of the text
*(unsigned int*)(buffer + finalPosition) = (unsigned int)_text.size();
// move the final position past the size of the string
finalPosition += sizeof(unsigned int);
// write the string
memcpy((void*)(buffer+finalPosition), _text.c_str(), (unsigned int)_text.size());
// move the final position past the end of the string
finalPosition += (unsigned int)_text.size();
// return the number of bytes written to the buffer
return finalPosition-position;
}
// deserialization
static int Deserialize(MyClass& myObject,
const char* buffer,
const unsigned int buffSize,
const unsigned int position)
{
insigned int currPosition = position;
// copy the numeric value
int numeric = *(int*)(buffer + currentPosition);
// increment the current position past the numeric value
currentPosition += sizeof(int);
// copy the size of the text
unsigned int textSize = *(unsigned int*)(buffer + currentPosition);
// increment the current position past the size of the text
currentPosition += sizeof(unsigned int);
// copy the text
string text((buffer+currentPosition), textSize);
if(currentPosition > buffSize)
{
// you decide what to do here
}
// Set your object's values
myObject.SetNumeric(numeric);
myObject.SetText(text);
// return the number of bytes deserialized
return currentPosition - position;
}
};

Assigning and retrieving bit-wise memory value for Genetic Algo

I came across this code for developing a class for GA/GP but failed to understand it and hence unable debug the program.
typedef struct {
void *dataPointer;
int length;
} binary_data;
typedef struct {
organism *organisms; //This must be malloc'ed
int organismsCount;
int (*fitnessTest)(organism org);
int orgDnaLength;
unsigned int desiredFitness;
void (*progress)(unsigned int fitness);
} evolutionary_algorithm;
The above is straight forward. Then we try to initiate organism before testing their fitnness etc...
int main(int argc, char *argv[])
{
srand(time(NULL));
int i;
evolutionary_algorithm ea;
ea.progress = progressDisplayer;
ea.organismsCount = 50;
ea.orgDnaLength = sizeof(unsigned int);
organism *orgs =(organism *) malloc(sizeof(organism) * ea.organismsCount);
for (i = 0; i < 50; i++)
{
organism newOrg;
binary_data newOrgDna;
newOrgDna.dataPointer = malloc(sizeof(unsigned int));
memset(newOrgDna.dataPointer, i, 1);
newOrgDna.length = sizeof(unsigned int);
newOrg.dna = newOrgDna;
orgs[i] = newOrg;
}
As far as i understand is the memset() tries to write a binary value into that memory location void pointer (newOrgDna.dataPointer) and so on. But i cant figure how to reassemble all those binary values to get the integer value assigned to variable "dna" of newOrg so that i check the integer value assign to the an individual organism and eventually the entire population residing in the entire memory location which has been assigned to "orgs".
As you guess from above, i not very familiar memory management at this deep level of details so your help is very much appreciated.
Thank you so much
This code looks a bit strange. This line:
newOrgDna.dataPointer = malloc(sizeof(unsigned int));
will allocate probably 4 bytes (or 8 on 64 bit machines). Strange part is that memset in line just below will set only first byte.
To get actual value you might do:
char val = *((char*) newOrgDna.dataPointer);
But, as I said, this code looks a bit off. I would rewrite it as:
for (i = 0; i < 50; i++)
{
organism newOrg;
binary_data newOrgDna;
unsigned int * data = (unsigned int*) malloc(sizeof(unsigned int));
*data = i;
newOrgDna.length = sizeof(*data);
newOrgDna.data = (void*) data; // I think that cast can be dropped
newOrg.dna = newOrgDna;
orgs[i] = newOrg;
}
Then everywhere you want to get data from organism * you can do:
void f( organism * o )
{
assert( sizeof(unsigned int) == o->dna.length );
unsigned int data = *((unsigned int*) o->dna.data);
}
Also this is rather a C question not C++.