Extremely fast hash function with collisions allowed - c++

My key is a 64 bit address and the output is a 1 byte number (0-255). Collisions are allowed but the probability of them occurring should be low. Also, assume that number of elements to be inserted are low, lets say not more than 255, as to minimize the pigeon hole effect.
The addresses are addresses of the functions in the program.

uint64_t addr = ...
uint8_t hash = addr & 0xFF;
I think that meets all of your requirements.

I would XOR together the 2 LSB (least significant bytes), if this distribues badly, then add a 3rd one, and so forth
The rationale behind this is the following: function addresses do not distribute uniformly. The problem normally lies in the lower (lsb) bits. Functions usually need to begin in addresses divisible by 4/8/16 so the 2-4 lsb are probably meaningless. By XORing with the next byte, you should get rid of most of these problems and it's still pretty fast.

Function addresses are, I think, quite likely to be aligned (see this question, for instance). That seems to indicate that you want to skip least significant bits, depending on the alignment.
So, perhaps take the 8 bits starting from bit 3, i.e. skipping the least significant 3 bits (bits 0 through 2):
const uint8_t hash = (address >> 3);
This should be obvious from inspection of your set of addresses. In hex, watch the rightmost digit.

How about:
uint64_t data = 0x12131212121211B12;
uint32_t d1 = (data >> 32) ^ (uint32_t)(data);
uint16_t d2 = (d1 >> 16) ^ (uint16_t)(d1);
uint8_t d3 = (d2 >> 8) ^ (uint8_t)(d2);
return d3;
It combined all bits of your 8 bytes with 3 shifts and three xor instructions.

Related

c++ combining 2 uint8_t into one uint16_t not working?

So I have a little piece of code that takes 2 uint8_t's and places then next to each other, and then returns a uint16_t. The point is not adding the 2 variables, but putting them next to each other and creating a uint16_t from them.
The way I expect this to work is that when the first uint8_t is 0, and the second uint8_t is 1, I expect the uint16_t to also be one.
However, this is in my code not the case.
This is my code:
uint8_t *bytes = new uint8_t[2];
bytes[0] = 0;
bytes[1] = 1;
uint16_t out = *((uint16_t*)bytes);
It is supposed to make the bytes uint8_t pointer into a uint16_t pointer, and then take the value. I expect that value to be 1 since x86 is little endian. However it returns 256.
Setting the first byte to 1 and the second byte to 0 makes it work as expected. But I am wondering why I need to switch the bytes around in order for it to work.
Can anyone explain that to me?
Thanks!
There is no uint16_t or compatible object at that address, and so the behaviour of *((uint16_t*)bytes) is undefined.
I expect that value to be 1 since x86 is little endian. However it returns 256.
Even if the program was fixed to have well defined behaviour, your expectation is backwards. In little endian, the least significant byte is stored in the lowest address. Thus 2 byte value 1 is stored as 1, 0 and not 0, 1.
Does endianess also affect the order of the bit's in the byte or not?
There is no way to access a bit by "address"1, so there is no concept of endianness. When converting to text, bits are conventionally shown most significant on left and least on right; just like digits of decimal numbers. I don't know if this is true in right to left writing systems.
1 You can sort of create "virtual addresses" for bits using bitfields. The order of bitfields i.e. whether the first bitfield is most or least significant is implementation defined and not necessarily related to byte endianness at all.
Here is a correct way to set two octets as uint16_t. The result will depend on endianness of the system:
// no need to complicate a simple example with dynamic allocation
uint16_t out;
// note that there is an exception in language rules that
// allows accessing any object through narrow (unsigned) char
// or std::byte pointers; thus following is well defined
std::byte* data = reinterpret_cast<std::byte*>(&out);
data[0] = 1;
data[1] = 0;
Note that assuming that input is in native endianness is usually not a good choice, especially when compatibility across multiple systems is required, such as when communicating through network, or accessing files that may be shared to other systems.
In these cases, the communication protocol, or the file format typically specify that the data is in specific endianness which may or may not be the same as the native endianness of your target system. De facto standard in network communication is to use big endian. Data in particular endianness can be converted to native endianness using bit shifts, as shown in Frodyne's answer for example.
In a little endian system the small bytes are placed first. In other words: The low byte is placed on offset 0, and the high byte on offset 1 (and so on). So this:
uint8_t* bytes = new uint8_t[2];
bytes[0] = 1;
bytes[1] = 0;
uint16_t out = *((uint16_t*)bytes);
Produces the out = 1 result you want.
However, as you can see this is easy to get wrong, so in general I would recommend that instead of trying to place stuff correctly in memory and then cast it around, you do something like this:
uint16_t out = lowByte + (highByte << 8);
That will work on any machine, regardless of endianness.
Edit: Bit shifting explanation added.
x << y means to shift the bits in x y places to the left (>> moves them to the right instead).
If X contains the bit-pattern xxxxxxxx, and Y contains the bit-pattern yyyyyyyy, then (X << 8) produces the pattern: xxxxxxxx00000000, and Y + (X << 8) produces: xxxxxxxxyyyyyyyy.
(And Y + (X<<8) + (Z<<16) produces zzzzzzzzxxxxxxxxyyyyyyyy, etc.)
A single shift to the left is the same as multiplying by 2, so X << 8 is the same as X * 2^8 = X * 256. That means that you can also do: Y + (X*256) + (Z*65536), but I think the shifts are clearer and show the intent better.
Note that again: Endianness does not matter. Shifting 8 bits to the left will always clear the low 8 bits.
You can read more here: https://en.wikipedia.org/wiki/Bitwise_operation. Note the difference between Arithmetic and Logical shifts - in C/C++ unsigned values use logical shifts, and signed use arithmetic shifts.
If p is a pointer to some multi-byte value, then:
"Little-endian" means that the byte at p is the least-significant byte, in other words, it contains bits 0-7 of the value.
"Big-endian" means that the byte at p is the most-significant byte, which for a 16-bit value would be bits 8-15.
Since the Intel is little-endian, bytes[0] contains bits 0-7 of the uint16_t value and bytes[1] contains bits 8-15. Since you are trying to set bit 0, you need:
bytes[0] = 1; // Bits 0-7
bytes[1] = 0; // Bits 8-15
Your code works but your misinterpreted how to read "bytes"
#include <cstdint>
#include <cstddef>
#include <iostream>
int main()
{
uint8_t *in = new uint8_t[2];
in[0] = 3;
in[1] = 1;
uint16_t out = *((uint16_t*)in);
std::cout << "out: " << out << "\n in: " << in[1]*256 + in[0]<< std::endl;
return 0;
}
By the way, you should take care of alignment when casting this way.
One way to think in numbers is to use MSB and LSB order
which is MSB is the highest Bit and LSB ist lowest Bit for
Little Endian machines.
For ex.
(u)int32: MSB:Bit 31 ... LSB: Bit 0
(u)int16: MSB:Bit 15 ... LSB: Bit 0
(u)int8 : MSB:Bit 7 ... LSB: Bit 0
with your cast to a 16Bit value the Bytes will arrange like this
16Bit <= 8Bit 8Bit
MSB ... LSB BYTE[1] BYTE[0]
Bit15 Bit0 Bit7 .. 0 Bit7 .. 0
0000 0001 0000 0000 0000 0001 0000 0000
which is 256 -> correct value.

General algorithm for reading n bits and padding with zeros

I need a function to read n bits starting from bit x(bit index should start from zero), and if the result is not byte aligned, pad it with zeros. The function will receive uint8_t array on the input, and should return uint8_t array as well. For example, I have file with following contents:
1011 0011 0110 0000
Read three bits from the third bit(x=2,n=3); Result:
1100 0000
There's no (theoretical) limit on input and bit pattern lengths
Implementing such a bitfield extraction efficiently without beyond the direct bit-serial algorithm isn't precisely hard but a tad cumbersome.
Effectively it boils down to an innerloop reading a pair of bytes from the input for each output byte, shifting the resulting word into place based on the source bit-offset, and writing back the upper or lower byte. In addition the final output byte is masked based on the length.
Below is my (poorly-tested) attempt at an implementation:
void extract_bitfield(unsigned char *dstptr, const unsigned char *srcptr, size_t bitpos, size_t bitlen) {
// Skip to the source byte covering the first bit of the range
srcptr += bitpos / CHAR_BIT;
// Similarly work out the expected, inclusive, final output byte
unsigned char *endptr = &dstptr[bitlen / CHAR_BIT];
// Truncate the bit-positions to offsets within a byte
bitpos %= CHAR_BIT;
bitlen %= CHAR_BIT;
// Scan through and write out a correctly shifted version of every destination byte
// via an intermediate shifter register
unsigned long accum = *srcptr++;
while(dstptr <= endptr) {
accum = accum << CHAR_BIT | *srcptr++;
*dstptr++ = accum << bitpos >> CHAR_BIT;
}
// Mask out the unwanted LSB bits not covered by the length
*endptr &= ~(UCHAR_MAX >> bitlen);
}
Beware that the code above may read past the end of the source buffer and somewhat messy special handling is required if you can't set up the overhead to allow this. It also assumes sizeof(long) != 1.
Of course to get efficiency out of this you will want to use as wide of a native word as possible. However if the target buffer necessarily word-aligned then things get even messier. Furthermore little-endian systems will need byte swizzling fix-ups.
Another subtlety to take heed of is the potential inability to shift a whole word, that is shift counts are frequently interpreted modulo the word length.
Anyway, happy bit-hacking!
Basically it's still a bunch of shift and addition operations.
I'll use a slightly larger example to demonstrate this.
Suppose we are give an input of 4 characters, and x = 10, n = 18.
00101011 10001001 10101110 01011100
First we need to locate the character contains our first bit, by x / 8, which gives us 1 (the second character) in this case. We also need the offset in that character, by x % 8, which equals to 2.
Now we can get out first character of the solution in three operations.
Left shift the second character 10001001 with 2 bits, gives us 00100100.
Right shift the third character 10101110 with 6 (comes from 8 - 2) bits, gives us 00000010.
Add these two characters gives us the first character in your return string, gives 00100110.
Loop this routine for n / 8 rounds. And if n % 8 is not 0, extract that many bits from the next character, you can do it in many approaches.
So in this example, our second round will give us 10111001, and the last step we get 10, then pad the rest bits with 0s.

Two values in one byte

In a single nibble (0-F) I can store one number from 0 to 15. In one byte, I can store a single number from 0 to 255 (00 - FF).
Can I use a byte (00-FF) to store two different numbers each in the range 0-127 (00 - 7F)?
The answer to your question is NO. You can split a single byte into two numbers, but the sum of the bits in the two numbers must be <= 8. Since, the range 0-127 requires 7 bits, the other number in the byte can only be 1 bit, i.e. 0-1.
For obvious cardinality reasons, you cannot store two small integers in the 0 ... 127 range in one byte of 0 ... 255 range. In other words the cartesian product [0;127]×[0;127] has 214 elements which is bigger than 28 (the cardinal of the [0;255] interval, for bytes)
(If you can afford losing precision - which you didn't tell - you could, e.g. by storing only the highest bits ...)
Perhaps your question is: could I store two small integers from [0;15] in a byte? Then of course you could:
typedef unsigned unibble_t; // unsigned nibble in [0;15]
uint8_t make_from_two_nibbles(unibble_t l, unibble_t r) {
assert(l<=15);
assert(r<=15);
return (l<<4) | r;
}
unibble_t left_nible (uint8_t x) { return x >> 4; }
unibble_t right_nibble (uint8_t) { return x & 0xf; }
But I don't think you always should do that. First, you might use bit fields in struct. Then (and most importantly) dealing with nibbles that way might be more inefficient and make less readable code than using bytes.
And updating a single nibble, e.g. with
void update_left_nibble (uint8_t*p, unibble_t l) {
assert (p);
assert (l<=15);
*p = ((l<<4) | ((*p) & 0xf));
}
is sometimes expensive (it involves a memory load and a memory store, so uses the CPU cache and cache coherence machinery), and most importantly is generally a non-atomic operation (what would happen if two different threads are calling simultaneously update_left_nibble on the same address p -i.e. with pointer aliasing- is undefined behavior).
As a rule of thumb, avoid packing more than one data item in a byte unless you are sure it is worthwhile (e.g. you have a billion of such data items).
One byte is not enough for two values in 0…127, because each of those values needs log2(128) = 7 bits, for a total of 14, but a byte is only 8 bits.
You can declare variables with bit-packed storage using the C and C++ bitfield syntax:
struct packed_values {
uint8_t first : 7;
uint8_t second : 7;
uint8_t third : 2;
};
In this example, sizeof(packed_values) should equal 2 because only 16 bits were used, despite having three fields.
This is simpler than using bitwise arithmetic with << and & operators, but it's still not quite the same as ordinary variables: bit-fields have no addresses, so you can't have a pointer (or C++ reference) to one.
Can I use a byte to store two numbers in the range 0-127?
Of course you can:
uint8_t storeTwoNumbers(unsigned a, unsigned b) {
return ((a >> 4) & 0x0f) | (b & 0xf0);
}
uint8_t retrieveTwoNumbers(uint8_t byte, unsigned *a, unsigned *b) {
*b = byte & 0xf0;
*a = (byte & 0x0f) << 4;
}
Numbers are still in range 0...127 (0...255, actually). You just loose some precision, similar to floating point types. Their values increment in steps of 16.
You can store two data in range 0-15 in a single byte, but you should not (one var = one data is a better design).
If you must, you can use bit-masks and bit-shifts to access to the two data in your variable.
uint8_t var; /* range 0-255 */
data1 = (var & 0x0F); /* range 0-15 */
data2 = (var & 0xF0) >> 4; /* range 0-15 */

How do I store a byte into a 4 byte number without changing the bytes around it?

So if I have a 4 byte number (say hex) and want to store a byte say DD into hex, at the nth byte position without changing the other elements of hex's number, what's the easiest way of going about that? I'm guessing it's some combination of bitwise operations, but I'm still quite new with them, and have found them quite confusing thus far?
byte n = 0xDD;
uint i = 0x12345678;
i = (i & ~0x0000FF00) | ((uint)n << 8);
Edit: Forgot to mention, be careful if you're doing this with signed data types, so that things don't get inadvertently sign-extended.
Mehrdad's answer shows how to do it with bit manipulation. You could also use the old byte array trick (assuming C or some other language that allows this silliness):
byte n = 0xDD;
uint i = 0x12345678;
byte *b = (byte*)&i;
b[1] = n;
Of course, that's processor specific in that big-endian machines have the bytes reversed from little-endian. Also, this technique limits you to working on exact byte boundaries whereas the bit manipulation will let you modify any given 8 bits. That is, you might want to turn 0x12345678 into 0x12345DD8, which the technique I show won't do.

Find "edges" in 32 bits word bitpattern

Im trying to find the most efficient algorithm to count "edges" in a bit-pattern. An edge meaning a change from 0 to 1 or 1 to 0. I am sampling each bit every 250 us and shifting it into a 32 bit unsigned variable.
This is my algorithm so far
void CountEdges(void)
{
uint_least32_t feedback_samples_copy = feedback_samples;
signal_edges = 0;
while (feedback_samples_copy > 0)
{
uint_least8_t flank_information = (feedback_samples_copy & 0x03);
if (flank_information == 0x01 || flank_information == 0x02)
{
signal_edges++;
}
feedback_samples_copy >>= 1;
}
}
It needs to be at least 2 or 3 times as fast.
You should be able to bitwise XOR them together to get a bit pattern representing the flipped bits. Then use one of the bit counting tricks on this page: http://graphics.stanford.edu/~seander/bithacks.html to count how many 1's there are in the result.
One thing that may help is to precompute the edge count for all possible 8-bit value (a 512 entry lookup table, since you have to include the bit the precedes each value) and then sum up the count 1 byte at a time.
// prevBit is the last bit of the previous 32-bit word
// edgeLut is a 512 entry precomputed edge count table
// Some of the shifts and & are extraneous, but there for clarity
edgeCount =
edgeLut[(prevBit << 8) | (feedback_samples >> 24) & 0xFF] +
edgeLut[(feedback_samples >> 16) & 0x1FF] +
edgeLut[(feedback_samples >> 8) & 0x1FF] +
edgeLut[(feedback_samples >> 0) & 0x1FF];
prevBit = feedback_samples & 0x1;
My suggestion:
copy your input value to a temp variable, left shifted by one
copy the LSB of your input to yout temp variable
XOR the two values. Every bit set in the result value represents one edge.
use this algorithm to count the number of bits set.
This might be the code for the first 3 steps:
uint32 input; //some value
uint32 temp = (input << 1) | (input & 0x00000001);
uint32 result = input ^ temp;
//continue to count the bits set in result
//...
Create a look-up table so you can get the transitions within a byte or 16-bit value in one shot - then all you need to do is look at the differences in the 'edge' bits between bytes (or 16-bit values).
You are looking at only 2 bits during every iteration.
The fastest algorithm would probably be to build a hash table for all possibles values. Since there are 2^32 values that is not the best idea.
But why don't you look at 3, 4, 5 ... bits in one step? You can for instance precalculate for all 4 bit combinations your edgecount. Just take care of possible edges between the pieces.
you could always use a lookup table for say 8 bits at a time
this way you get a speed improvement of around 8 times
don't forget to check for bits in between those 8 bits though. These then have to be checked 'manually'