i'm developing an application that involves screen capture and hashing with C/C++. The image i'm capturing is about 250x250 in dimensions and i'm using the winapi HashData function for hashing.
My goal is to compare 2 hashes (etc. 2 images of 250x250) and instantly tell if they're equal.
My code:
const int PIXEL_SIZE = (sc_part.height * sc_part.width)*3;
BYTE* pixels = new BYTE[PIXEL_SIZE];
for(UINT y=0,b=0;y<sc_part.height;y++) {
for(UINT x=0;x<sc_part.width;x++) {
COLORREF rgb = sc_part.pixels[(y*sc_part.width)+x];
pixels[b++] = GetRValue(rgb);
pixels[b++] = GetGValue(rgb);
pixels[b++] = GetBValue(rgb);
}
}
const int MAX_HASH_LEN = 64;
BYTE Hash[MAX_HASH_LEN] = {0};
HashData(pixels,PIXEL_SIZE,Hash,MAX_HASH_LEN);
... i have now my variable-size hash, above example uses 64 bytes
delete[] pixels;
I've tested different hash sizes and their ~time for completion, which was roughly about:
32 bytes = ~30ms
64 bytes = ~47ms
128 bytes = ~65ms
256 bytes = ~125ms
My question is:
How long should the hash code be for a 250x250 image to prevent any duplicates, like never?
I don't like a hash code of 256 characters, since it will cause my app to run slowly (since the captures are very frequent). Is there a "safe" hash size per dimensions of image for comparing?
thanx
Assuming, based on your comments, that you're adding the hash calculated "on-the-fly" to the database, and so the hash of every image in the database ends up getting compared to the hash of every other image in the database then you've run into the birthday paradox. The likelihood that there are two identical numbers in a set of randomly selected numbers (eg. the birthdays of group of people) is greater than what you'd intuitively assume. If there are 23 people in a room then there's a 50:50 chance two of them share the same birthday.
That means assuming a good hash function then you can expect a collision, two images having the same hash despite not being identical, after 2^(N/2) hashes, where N is the number bits in the hash.1 If your hash function isn't so good you can expect a collision even earlier. Unfortunately only Microsoft knows how good HashData actually is.
Your commments also bring up a couple of other issues. One is that HashData doesn't produce variable sized hashes. It produces an array of bytes that's always the same length as the value you passed as the hash length. Your problem is that you're treating it instead as a string of characters. In C++ strings are zero terminated, meaning that the end of string is marked with a zero valued character ('\0'). Since the array of bytes will contain 0 valued elements at random positions it will appear to be truncated when used a string. Treating the hash a string like this will make it much more likely that you'll get a collision.
The other issue is that you said that you stored the images being compared in your database and that these images must be unique. If this uniqueness is being enforced by the database then checking for uniqueness in your own code is redundant. Your database might very well be able to do this faster than your own code.
GUIDs (Globally Unique IDs) are 16 bytes long, and Microsoft assumes that no GUIDs will ever collide.
Using a 32 byte hash is equivalent to taking two randomly generated GUIDs and comparing them against two other randomly generated GUIDs.
The odds are vanishingly small (1/2^256) or 1.15792089E-77 that you will get a collision with a 32 byte hash.
The universe will reach heat death long before you get a collision.
This comment from Michael Grier more or less encapsulates my beliefs. In the worst case, you should take an image, compute a hash, change the image by 1 byte, and recompute the hash. A good hash should change by more than one byte.
You also need to trade this off against the "birthday effect" (aka the pigeonhole principle) - any hash will generate collisions. A quick comparison of the first N bytes, though, will typically reject collisions.
Cryptographic hashes are typically "better" hashes in the sense that more hash bits change per input bit change, but are much slower to compute.
Related
tl;dr:
Is there a way to split the data of one mpz_t into two mpz_t instances and leave the data in its place in memory, thus having a constant time operation, no matter how much data is getting split in two parts?
I need to split it only at the boundaries of the limbs (elements of the mpz_t), i.e. no sub-element bit shifting is required.
Details:
I want to optimize an algorithm that's already working. The algorithm is intended to process numbers in the range of a few GB for a single number.
My question is similar to Efficiently take N lowest bits of GMP mpz_t . The difference is that I also need to avoid the copying of large amounts of data. I don't want to copy data from an existing number. Instead, I am looking for a way to keep the number's data in place, but have two instances of mpz_t point at different regions of the same data.
Why I want this:
One of my previous performance optimizations is to process the number iteratively, starting from the least significant end of the number. That yields a good performance boost. However, for chopping off data from that end, my current implementation copies data from the original number (See other linked question) and then the original number is shifted. This has the disadvantages that it takes more time and, most notably, more memory during the operation. That's significant if a single number has a few GB.
What I currently do:
mpz_class n = ...; // huge number
mpz_class chunk; // will hold the part of n that we chopped off
size_t limbs_to_chop_off = 12345;
size_t bits_to_chop_off = sizeof(mp_limb_t) * 8 * limbs_to_chop_off;
mpz_tdiv_r_2exp(chunk.get_mpz_t(), n.get_mpz_t(), bits_to_chop_off);
n >>= bits_to_chop_off;
// process chunk ...
I have an Image with sizes M x N, and each pixel is 14 bits (all of them are stored in 16 bit integers but 2 least significant bits are not used). I want to map each pixel to an 8 bit value, due to a mapping function which is simply an array of 16384 values. I perform this image tone mapping using pure C++ as follows:
for(int i=0;i<imageSize;i++)
{
resultImage[i] = mappingArray[image[Index]];
}
However, I want to optimize this operation using ARM Neon intrinsics. Since there are 32 (correct it if I'm wrong) neon (dx) registers registers, I cannot use VTBL instruction for a lookup table larger than
8x32 = 256 elements. Moreover, there is another discussion on stacoverflow to use a lookup table larger than 32 bytes:
ARM NEON: How to implement a 256bytes Look Up table
How can I manage to optimize such simple looking operation? I think of using pixels of the image as address parameter of VLD function just as something like the following:
VLD1.8 {d1},[d0] ??
Is it possible? Or how can I handle this?
The optimization in the other example works by holding an entire lookup table in registers. You simply cannot do this: your table is 16384 bytes (2^14 -> 2^8), and that is way, way more than you have in register space.
Hence, your table will reside in L1 cache. The obvious C++ code:
unsigned char mappingArray[16384];
fill(mappingArray);
for(int i=0;i<imageSize;i++)
{
resultImage[i] = mappingArray[image[i]>>2];
}
will probably compile straight to the most efficient code. The problem isn't how you get things in registers. The problem is that you need memory access to your input image, mapping table and output image.
If speed was a problem, I'd solve this by aggressively trimming the table to perhaps 128 entries, and using linear interpolation on the next few bits.
Given a large look-up table, the normal process is to look very closely at it to figure out (or find on the internet) the algorithm to compute each entry. If that algorithm turns out to be simple enough then you might find that it's faster to perform the calculations in parallel rather than to perform scalar table look-ups.
Alternatively, based on the shape of the data you can try to find approximations which are up to requirements but which are easier to compute.
For example, you can use VTBL on the top three or four bits of the input, and linear interpolation on the rest. But this only works if the curve is smooth enough that linear interpolation is an adequate approximation.
A common operation which matches the parameters stated is linear to sRGB conversion; in which case you're looking at raising each input to the power of 5/12. That's a bit hairy, but you might still be able to get some performance gain if you don't need to be too accurate.
So I know functionally what I would like to happen, I just don't know the best way to make a computer do it... in C++...
I would like to implement a C++ function that maps a 10 bit sequence to a 6 bit sequence.
Nevermind what the bits stand for right now... There are 2^10 = 1024 possible inputs. There are 2^6 = 64 different outputs. Probably lots of patterns. Obviously lots of patterns. But it's complicated. It's s a known mapping, just a complicated mapping.
The output is just one of 64 possibilities. Maybe they all don't get used. They probably won't. But assume they do.
Right now, I'm thinking a quadruple nested switch statement that just takes care of each of the 1024 cases and takes care of business inline, assigning appropriate values to whatever pointer to whatever structure I passed to this function. This seems to be naive and sort of slow. Not that I've implemented it, but that's why I want to ask you first.
This basic function (mapping) will have to be run at every statement node, often more than once, for as many statements this system wishes to support. I ask you, how do I map 10 bits to 6 bits as efficiently as possible in C++?
I know what the mapping is, I know which inputs of 10 bits go with what output of 6 bits... I could totally hard-code that ... somehow? Multi-switch is so ugly. How can I map my 10 bits to 6 bits?! Neural net? Memory muffin? What would you do?
Note to self: So here is why I am not a fan of the lookup table. Let's assume all inputs are equally likely (of course they are not, and could be ordered more effectively, but still) then it will take on average 512 memory advances of the array to retrieve the output values... It seems that if you make a (global, why not) binary tree 10 levels deep, you cover the 1024 inputs and can retrieve the output in an average of just 10 steps... and maybe less if there are good patterns... given a deterministic function that is run so often, how best to retrieve known outputs from known inputs?
I would use a lookup table of 1024 elements. So hard-code that and just access it by index.
This saves the need for a massive switch statement and will probably be much more readable.
Depends on your definition of efficiency.
Time-efficient: Look-up table.
Space-efficient: Use a Karnaugh map.
Use a look-up table of size 1024.
If you need to map to some particular 6-bit values, use a lookup table of size 64 (not 1024!) after dividing by 16. This will fit into the cache more easily than a 16-times redundant 1024-entry table (and, the 2 extra cycles for a right shift outweight the cost of a possible cache miss by far).
Otherwise, if a simple sequential mapping is fine, just do a divide by 16.
1024/64 = 16, so dividing by 16 (a right shift with compiler optimizations turned on), maps to 6 bits (sequentially). It cannot get more efficient than that.
Lets assume i have a function that takes 32bit integer in, and returns random 32bit integer out.
Now, i want to see how many and which duplicate values this function will return on all possible input values from 0 to 2^32-1. I could make this easy if i had more than 4gigs free ram, but i dont have more than 1gig ram.
I tried to map the calculated values on disk, using 4gig file where one byte represented how many duplicates it had got, but i noticed the approximated finishing time will be 25 days in the future with my HDD speeds! (i had to use SSD in fear of breaking my HDD...)
So, now the next step is to calculate this all in RAM and not use disk at all, but i ran at wall when thinking how to solve this elegantly. The only method i could think of was to loop (2^32)*(2^32) times the function, but this is obviously even slower than my HDD method.
What i need now is some nasty ideas to speed this up!
Edit: The function isnt really a random function, but similar to a random function, but the fact is you dont need to know anything about the function, its not the problem here. I want to see all the duplicates by my bare eyes, not just some mathematical guessing how many there could be. Why im doing this? Out of curiosity :)
To check for 2^32 possible duplicates you only need 4 gigabits which is 512MB, since you need only a single bit per value. The first hit of a zero bit sets it to 1 and on every hit of a 1 bit you know you have a duplicate and can print it out or do whatever you want to do with it.
I.e. you can do something like this:
int value = nextValue(...);
static int bits[] = new int[ 0x08000000 ]();
unsigned int idx = value >> 5, bit = 1 << ( value & 31 );
if( bits[ idx ] & bit )
// duplicate
else
bits[ idx ] |= bit;
in response to your comments
Yes, putting the duplicates into a map is a good idea if there are not too many and not to many different duplicates. The worst case here is 2^31 entries if every 2nd value appears exactly twice. If the map becomes too large to be held in in memory at once you can partition it, i.e. by only allowing values in the certain range, i.e. a quarter of the entire number space. This would make the map have only 1/4th of the size of the entire map if the duplicates are distributed rather equally. You would of course need to run the program 4 times for each quarter to find all duplicates.
To find also the 1st duplicate you can run it in two passes: In the first pass you use the bitmap to find the duplicates and put them into the map. In the 2nd pass you skip the bitmap and add the values into the map if there is already a entry in the map and the value is not yet there.
No, there is no good reason for a int over a unsigned int array. you can as well use unsigned int which would actually be more appropriate here.
The unaskable question: Why?. what are you trying to achieve?
Is this some kind of Monte-Carlo experiment?
If not, just look up the implementation algorithn of your (P)RNG and it will tell you exactly what the distribution of values is going to be.
Have a look at Boost.Random for more choices than you can fathom, and it will have e.g. uniform_int<> and variate generators that can limit your output range while still having well-defined guarantees on distribution of values across the output domain
I want to store lots of information to a block by bits, and save it into a file.
To keep my file not so big, I want to use a small number of bits to save specified information instead of a int.
For example, I want to store Day, Hour, Minute to a file.
I only want 5 bit(day) + 5 bit(Hour) + 6 bit(Minute) = 16 bit of memory for data storage.
I cannot find a efficient way to store it in a block to put in a file.
There are some big problems in my concern:
the data length I want to store each time is not constant. It depends on the incoming information. So I cannot use a structure to store it.
there must not be any unused bit in my block, I searched for some topics that mentioned that if I store 30 bits in an int(4 byte variable), then the next 3 bit I save will automatically go into the next int. but I do not want it to happen!!
I know I can use shift right, shift left to put a number to a char, and put the char into a block, but it is inefficient.
I want a char array that I can continue putting specified bits into, and use write to put it into a file.
I think I'd just use the number of bits necessary to store the largest value you might ever need for any given piece of information. Then, Huffman encode the data as you write it (and obviously Huffman decode it as you read it). Most other approaches are likely to be less efficient, and many are likely to be more complex as well.
I haven't seen such a library. So I'm afraid you'll have to write one yourself. It won't be difficult, anyway.
And about the efficiency. This kind of operations always need bits shifting and masking, because few CPUs support directly operating into bits, especially between two machine words. The only difference is you or your compiler does the translation.