C++ What should we pass in MurmurHash3 parameters? - c++

I am confused with what parameter should I provide for the MurmurHash3_x86_128(). The murmurhash3 code can be found https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp. Method definition is given below.
void MurmurHash3_x86_128 ( const void * key, const int len,
uint32_t seed, void * out )
I have passed the following values in the above method but my compiler is giving me segmentation fault. What am i doing wrong ?
int main()
{
uint64_t seed = 1;
uint64_t *hash_otpt;
const char *key = "hi";
MurmurHash3_x64_128(key, (uint64_t)strlen(key), seed, hash_otpt);
cout << "hashed" << hash_otpt << endl;
return 0;
}

This function put its hash in 128 bits of memory.
What your are doing is passing a pointer, that is not allocated yet to it.
The correct usage would be something like that:
int main()
{
uint64_t seed = 1;
uint64_t hash_otpt[2]; // allocate 128 bits
const char *key = "hi";
MurmurHash3_x64_128(key, (uint64_t)strlen(key), seed, hash_otpt);
cout << "hashed" << hash_otpt[0] << hash_otpt[1] << endl;
return 0;
}
You could have noticed that by analyzing how MurmurHash3_x86_128 fills out parameter:
((uint64_t*)out)[0] = h1;
((uint64_t*)out)[1] = h2;

hash_otpt is a pointer to nothing, but the function expects the fourth argument to be a pointer to some memory as it writes its output into this memory. In your example, it attempts a write operation, but fails (there's nowhere to write to as the pointer is not initialized). This gives you a SegmentationFault.
Figure out in how many uint64_ts does the hash fit into (2, because the output's size is 128 bits, and the size of a uint64_t is 64 bits) and allocate the memory:
hash_otpt = new uint64_t [2];

If you look at the documentation, you can see
MurmurHash3_x64_128 ... It has a 128-bit output.
So, your code can be something like this
uint64_t hash_otpt[2]; // This is 128 bits
MurmurHash3_x64_128(key, (uint64_t)strlen(key), seed, hash_otpt);
Note that you don't have to dynamically allocate the output at all.

Related

Best practices for large array of integers in C++

I am loading a 123 MB file of unsigned integers that needs to be in memory (for fast look ups for a monte carlo simulation) in C++. Right now I have a global array but i've heard global arrays are frowned upon. What are the best practices for this?
For context, I'm doing Monte Carlo simulations on a poker game and need an array of about 30 million integers to quickly compute the winner of a poker hand. To determine the winner, you first compute the 'handranks' by doing 7 queries of the array. Then to determine the winner, you compare the 'handranks'.
int HR[32487834];
int get_handrank(const std::array<int,7> cards)
{
int p = 53;
for (const auto& c: cards)
p = HR[p + c];
return p;
}
int main()
{
// load the data
memset(HR, 0, sizeof(HR));
FILE * fin = fopen("handranks.dat", "rb");
if (!fin)
std::cout << "error when loading handranks.dat" << std::endl;
size_t bytesread = fread(HR, sizeof(HR), 1, fin);
fclose(fin);
std::cout << "complete.\n\n";
// monte carlo simulations using get_handrank() function
.
.
.
}
Use local variables, and pass them to functions as appropriate. This makes the program easier to reason about.
Use vector instead of an array for this large amount of data, otherwise you might cause stack overflow.
Modify your functions to work with std::span instead of a particular container, as this creates more decoupling.
Create a symbolic constant with a meaningful name for 32487834 instead of using it as a magic constant.
Using local variables and passing them around is always a better practice than having globals IMO. It makes your algorithms more flexible. What if you need to use a different array for some reason later on? You would need to modify all the functions using the global variable. Passing around the array is a bit inconvenient and verbose I agree, but still better than the globals. We will address this problem later on.
So your first option is something like:
// Don't forget to receive the params as references to avoid copying
int get_handrank(const std::array<int,7>& cards, const std::array<int, 32487834>& HR)
{
int p = 53;
for (const auto& c: cards)
p = HR[p + c];
return p;
}
int main()
{
std::array<int, 32487834> HR{}; //zero-inits the array
// or std::vector<int> HR{}; HR.resize(32487834) to avoid stack overflow as #MarkRansom pointed out
FILE * fin = fopen("handranks.dat", "rb");
if (!fin)
std::cout << "error when loading handranks.dat" << std::endl;
size_t bytesread = fread(HR.data(), HR.size() * sizeof(int), 1, fin);
fclose(fin);
}
Second approach, even better: Use classes. You can have a simulation class, and the class can have the HR array as a const private member read and initialized in the constructor. Then get_handrank can be a member function and can access the member HR array:
class Simulation {
public:
int get_handrank(const std::array<int,7>& cards)
{
int p = 53;
for (const auto& c: cards)
p = HR[p + c];
return p;
}
Simulation()
: HR{}
// or HR{readHRFromFileFunction()}
{
FILE * fin = fopen("handranks.dat", "rb");
if (!fin)
std::cout << "error when loading handranks.dat" << std::endl;
size_t bytesread = fread(HR.data(), HR.size() * sizeof(int), 1, fin);
fclose(fin);
}
private:
std::array<int, 32487834> HR;
//or const std::array<int, 32487834> HR; if you use a function to init it
}

Qt: from a fixed number of bytes to an integer

Using Qt5.4, I build the function generateRandomIDOver2Bytes. It generates a random number and it puts it onto a variable that occupies exactly two bytes.
QByteArray generateRandomIDOver2Bytes() {
QString randomValue = QString::number(qrand() % 65535);
QByteArray x;
x.setRawData(randomValue.toLocal8Bit().constData(), 2);
return x;
}
My issue is reverting the so generated value in order to obtain, again, an integer.
The following minimum example actually does not work:
QByteArray tmp = generateRandomIDOver2Bytes(); //for example, the value 27458
int value = tmp.toUInt();
qDebug() << value; //it prints always 9
Any idea?
A 16 bit integer can be split into individual bytes by bit operations.
This way, it can be stored into a QByteArray.
From Qt doc. of QByteArray:
QByteArray can be used to store both raw bytes (including '\0's) and traditional 8-bit '\0'-terminated strings.
For recovering, bit operations can be used as well.
The contents of the QByteArray does not necessarily result into printable characters but that may not (or should not) be required in this case.
testQByteArrayWithUShort.cc:
#include <QtCore>
int main()
{
quint16 r = 65534;//qrand() % 65535;
qDebug() << "r:" << r;
// storing r in QByteArray (little endian)
QByteArray qBytes(2, 0); // reserve space for two bytes explicitly
qBytes[0] = (uchar)r;
qBytes[1] = (uchar)(r >> 8);
qDebug() << "qBytes:" << qBytes;
// recovering r
quint16 rr = qBytes[0] | qBytes[1] << 8;
qDebug() << "rr:" << rr;
}
Output:
r: 65534
qBytes: "\xFE\xFF"
rr: 65534
Given the random value 27458, when you do this:
x.setRawData(randomValue.toLocal8Bit().constData(), 2);
you're filling the array with the first two bytes of this string: "27458".
And here:
int value = tmp.toUInt();
the byte array is implicitly cast to a string ("27"), which in turn is converted to a numeric value (an unsigned integer).
Let's try something different, that maybe suits your need.
First, store the value in a numeric variable, possibly of the deisred size (16 bits, 2 bytes):
ushort randomValue = qrand() % 65535;
then just return a byte array, built using a pointer to the ushort, cast to char * (don't use setRawData, because it doesn't copy the bytes you pass it in, as well explained here):
return QByteArray(reinterpret_cast<char *>(&randomValue), 2);
To get back to the value:
QByteArray tmp = generateRandomIDOver2Bytes(); //for example, the value 27458
ushort value;
memcpy(&value, tmp.data(), 2);
Please notice: types do matter here. You wrote an uint in a byte array, you must read an uint out of it.
All this can be generalized in a class like:
template <typename T>
class Value
{
QByteArray bytes;
public:
Value(T t) : bytes(reinterpret_cast<char*>(&t), sizeof(T)) {}
T read() const
{
T t;
memcpy(&t, bytes.data(), sizeof(T));
return t;
}
};
so you can have a generic function like:
template<typename T>
Value<T> generateRandomIDOverNBytes()
{
T value = qrand() % 65535;
qDebug() << value;
return Value<T>(value);
}
and safely use the type your prefer to store the random value:
Value<ushort> value16 = generateRandomIDOverNBytes<ushort>();
qDebug() << value16.read();
Value<int> value32 = generateRandomIDOverNBytes<int>();
qDebug() << value32.read();
Value<long long> value64 = generateRandomIDOverNBytes<long long>();
qDebug() << value64.read();

How do I extract little-endian unsigned short from long pointer?

I have a long pointer value that points to a 20 byte header structure followed by a larger array. Dec(57987104)=Hex(0374D020). All the values are stored little endian. 1400 when swapped is 0014 which in decimal is 20.
The question here is how do I get the first value which is a 2 byte unsigned short. I have a C++ dll to convert this for me. I'm running Windows 10.
GetCellData_API unsigned short __stdcall getUnsignedShort(unsigned long ptr)
{
unsigned long *p = &ptr;
unsigned short ret = *p;
return ret;
}
But when I call this from VBA using Debug.Print getUnsignedShort(57987104) I get 30008 when it should be 20.
I might need to do an endian swap but I'm not sure how to incorporate this from CodeGuru: How do I convert between big-endian and little-endian values?
inline void endian_swap(unsigned short& x)
{
x = (x >> 8) |
(x << 8);
}
How do I extract little endian unsigned short from long pointer?
I think I'd be inclined to write your interface function in terms of a general template function that describes the operation:
#include <utility>
#include <cstdint>
// Code for the general case
// you'll be amazed at the compiler's optimiser
template<class Integral>
auto extract_be(const std::uint8_t* buffer)
{
using accumulator_type = std::make_unsigned_t<Integral>;
auto acc = accumulator_type(0);
auto count = sizeof(Integral);
while(count--)
{
acc |= accumulator_type(*buffer++) << (8 * count);
}
return Integral(acc);
}
GetCellData_API unsigned short __stdcall getUnsignedShort(std::uintptr_t ptr)
{
return extract_be<std::uint16_t>(reinterpret_cast<const std::uint8_t*>(ptr));
}
As you can see from the demo on godbolt, the compiler does all the hard work for you.
Note that since we know the size of the data, I have used the sized integer types exported from <cstdint> in case this code needs to be ported to another platform.
EDIT:
Just realised that your data is actually LITTLE ENDIAN :)
template<class Integral>
auto extract_le(const std::uint8_t* buffer)
{
using accumulator_type = std::make_unsigned_t<Integral>;
auto acc = accumulator_type(0);
constexpr auto size = sizeof(Integral);
for(std::size_t count = 0 ; count < size ; ++count)
{
acc |= accumulator_type(*buffer++) << (8 * count);
}
return Integral(acc);
}
GetCellData_API unsigned short __stdcall getUnsignedShort(std::uintptr_t ptr)
{
return extract_le<std::uint16_t>(reinterpret_cast<const std::uint8_t*>(ptr));
}
Lets say youre pointing with pulong pulong[6] you are pointing 6 sixth member of the table
unsigned short psh*;
unsigned char puchar*
unsigend char ptable[4];
ZeroMemory(ptable,4);
puchar[3]=((char *)( &pulong[6]))[0];
puchar[2]=((char *)( &pulong[6]))[1];
puchar[1]=((char *)( &pulong[6]))[2];
puchar[0]=((char *)( &pulong[6]))[3];
psh=(unsigned short *) puchar;
//first one
psh[0];
//second one
psh[1];
THis was what was in my mind while mistaking me

Size of an object without using sizeof in C++

This was an interview question:
Say there is a class having only an int member. You do not know how many bytes the int will occupy. And you cannot view the class implementation (say it's an API). But you can create an object of it. How would you find the size needed for int without using sizeof.
He wouldn't accept using bitset, either.
Can you please suggest the most efficient way to find this out?
The following program demonstrates a valid technique to compute the size of an object.
#include <iostream>
struct Foo
{
int f;
};
int main()
{
// Create an object of the class.
Foo foo;
// Create a pointer to it.
Foo* p1 = &foo;
// Create another pointer, offset by 1 object from p1
// It is legal to compute (p1+1) but it is not legal
// to dereference (p1+1)
Foo* p2 = p1+1;
// Cast both pointers to char*.
char* cp1 = reinterpret_cast<char*>(p1);
char* cp2 = reinterpret_cast<char*>(p2);
// Compute the size of the object.
size_t size = (cp2-cp1);
std::cout << "Size of Foo: " << size << std::endl;
}
Using pointer algebra:
#include <iostream>
class A
{
int a;
};
int main() {
A a1;
A * n1 = &a1;
A * n2 = n1+1;
std::cout << int((char *)n2 - (char *)n1) << std::endl;
return 0;
}
Yet another alternative without using pointers. You can use it if in the next interview they also forbid pointers. Your comment "The interviewer was leading me to think on lines of overflow and underflow" might also be pointing at this method or similar.
#include <iostream>
int main() {
unsigned int x = 0, numOfBits = 0;
for(x--; x; x /= 2) numOfBits++;
std::cout << "number of bits in an int is: " << numOfBits;
return 0;
}
It gets the maximum value of an unsigned int (decrementing zero in unsigned mode) then subsequently divides by 2 until it reaches zero. To get the number of bytes, divide by CHAR_BIT.
Pointer arithmetic can be used without actually creating any objects:
class c {
int member;
};
c *ptr = 0;
++ptr;
int size = reinterpret_cast<int>(ptr);
Alternatively:
int size = reinterpret_cast<int>( static_cast<c*>(0) + 1 );

C++ MurmurHash3 : how to hash integer

I am confused with how should i call MurmurHash3_x86_128() with integer key value or is it even possible ? The murmurhash3 code can be found https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp. Method definition is given below.
void MurmurHash3_x86_128 ( const void * key, const int len,
uint32_t seed, void * out )
I am hashing integer value with len as 1 . Is it correct or wrong ?
int main()
{
uint64_t seed = 100;
int p = 500; // key to hash
uint64_t hash_otpt[2]= {0};
const int *key = &p;
MurmurHash3_x64_128(key, 1, seed, hash_otpt); // 0xb6d99cf8
cout << *hash_otpt << endl;
}
You are passing key, which is a pointer to (const) int, so you should be passing sizeof(int) as the length.
Passing 1 would only work in case int is 1 byte wide on your platform, which is rarely the case.