Is it possible to decrypt Peter Weinberger's Hash algorithm?
I am attempting to write my own Encrypt Decrypt functions. I understand the concept that a Hash value means you can not or are not supposed to decrypt the hash value but I am thinking that because the algorithm is relatively simple that it maybe possible in this case to decrypt this sort of hash. I've done the simple Encrypt Decrypt that uses simple rotation and now I want to try something more difficult.
So is it possible to decrypt a hash value produced from Peter Weinberger's Hash algorithm?
The following encrypt function is Peter Weinberger's exact Hash algorithm, the decrypt is my own attempt which is not working:
int encrypt(char *s)
{
/* Peter Weinberger's */
char *p;
unsigned int h, g;
h = 0;
for(p=s; *p!='\0'; p++){
h = (h<<4) + *p; printf("Step : ");
if (g = h&0xF0000000) {
h ^= g>>24;
h ^= g;
}
}
return h % 211;
}
std::string decrypt(int v)
{
/* Peter Weinberger's */
unsigned int h, g;
h = 0;
v /= 211;
int s = sqrt(v);
/* Not sure what to do here
for(p=s; *p!='\0'; p++){
}
*/
return string(h);
}
Considering the extremely small output size, a brute-force attack is trivial.
Generate a string(for example randomly)
Hash it
If it matches the known value, you found a first pre-image, else go to step 1
This will take 211 attempts on average to get a string that matches the given hash. It probably won't be the original string, but that's to be expected given the lossy nature of hashing.
For two character inputs this hash becomes (16*s[0]+s[1])%211 which you can rewrite as (16*(s[0]-'A')+(s[1]-'A') + 50)%211
Solving for the string you get:
s[0]=(hash+161)/16+'A';
s[1]=(hash+161)%16+'A';
For example for s == "AB" you get hash==51. Then using above formulas to reverse it:
s[0] = 13 +'A' = 'N'
s[1] = 4 +'A' = 'E'
=> s="NE" which matches the hash 51, but isn't the original string.
If I understand the algorithm correctly, for each character it does:
Multiply the hash by 16 (move it 4 bits to the left)
Add a character of the string
If the result has more than 28 bits, remove the upper 4 bits and XOR them somewhere in the hash.
By limiting the string to size 6 (or 7 if the first byte is less than 16), step 3 will never occur. So all that is left, is a simple shift-and-add.
When the string has 6 characters, the final result is this sum (h = higher 4 bits of a character, l = lower 4 bits):
pos: bits
0: .hl00000
1: ..hl0000
2: ...hl000
3: ....hl00
4: .....hl0
5: ......hl
----------- +
0******* Result is 32 bits with upper 4 bits zero
We see that the bits 24-27 are determined by the high 4 bits of the character at position 0, plus a possible carry from the addition in the lower bits. Bits 20-23 are a sum of the lower bits of char 0 and the higher bits of char 1 (plus possible carry).
If the input characters can have all 255 possible values (with the exception of zero), it is not that hard to create a string that generates the hash.
Look at the highest 4 bits in the hash. This will be the high part of the character at pos 0.
Look at the next 4 bits in the hash. This is the addition of the high part of char 1 and the lower part of char 0.
Pick a value for the highest part. E.g. 'highest part is always zero' or 'highest part is always 0100 (upper case letters)'.
If the bits in the has are less than the value you picked in step 3, borrow some from the previous bits (If those bits were zero, ripply it through to the next bit group).
Now you have the lower part of char 0 and the high part of char 1.
Go back to step 2 for the next 4 bits, until you reach the end of the hash
Check that there are no characters with value 0 in your string.
Code would be a bit more complex as there are all kind of edge cases (e.g. hash 01000000), and is left as an exercise to the reader ;).
Edit: I totally missed the h % 211 operation. Which makes it even easier, as CodesInChaos demonstrates.
Related
Suppose I have two numbers(minimum and maximum) . `
for example (0 and 9999999999)
maximum could be so huge. now I also have some other number. it could be between those minimum and maximum number. Let's say 15. now What I need to do is get all the multiples of 15(15,30,45 and so on, until it reaches the maximum number). and for each these numbers, I have to count how many 1 bits there are in their binary representations. for example, 15 has 4(because it has only 4 1bits).
The problem is, I need a loop in a loop to get the result. first loop is to get all multiples of that specific number(in our example it was 15) and then for each multiple, i need another loop to count only 1bits. My solution takes so much time. Here is how I do it.
unsigned long long int min = 0;
unsigned long long int max = 99999999;
unsigned long long int other_num = 15;
unsigned long long int count = 0;
unsigned long long int other_num_helper = other_num;
while(true){
if(other_num_helper > max) break;
for(int i=0;i<sizeof(int)*4;i++){
int buff = other_num_helper & 1<<i;
if(buff != 0) count++; //if bit is not 0 and is anything else, then it's 1bits.
}
other_num_helper+=other_num;
}
cout<<count<<endl;
Look at the bit patterns for the numbers between 0 and 2^3
000
001
010
011
100
101
110
111
What do you see?
Every bit is one 4 times.
If you generalize, you find that the numbers between 0 and 2^n have n*2^(n-1) bits set in total.
I am sure you can extend this reasoning for arbitrary bounds.
Here's how I do it for a 32 bit number.
std::uint16_t bitcount(
std::uint32_t n
)
{
register std::uint16_t reg;
reg = n - ((n >> 1) & 033333333333)
- ((n >> 2) & 011111111111);
return ((reg + (reg >> 3)) & 030707070707) % 63;
}
And the supporting comments from the program:
Consider a 3 bit number as being 4a + 2b + c. If we shift it right 1 bit, we have 2a + b. Subtracting this from the original gives 2a + b + c. If we right-shift the original 3-bit number by two bits, we get a, and so with another subtraction we have a + b + c, which is the number of bits in the original number.
The first assignment statement in the routine computes 'reg'. Each digit in the octal representation is simply the number of 1’s in the corresponding three bit positions in 'n'.
The last return statement sums these octal digits to produce the final answer. The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63.
This is accomplished by right-shifting 'reg' by three bits, adding it to 'reg' itself and ANDing with a suitable mask. This yields a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1’s among those six positions in n. This number modulo 63 yields the final answer. For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023.
I need a function to read n bits starting from bit x(bit index should start from zero), and if the result is not byte aligned, pad it with zeros. The function will receive uint8_t array on the input, and should return uint8_t array as well. For example, I have file with following contents:
1011 0011 0110 0000
Read three bits from the third bit(x=2,n=3); Result:
1100 0000
There's no (theoretical) limit on input and bit pattern lengths
Implementing such a bitfield extraction efficiently without beyond the direct bit-serial algorithm isn't precisely hard but a tad cumbersome.
Effectively it boils down to an innerloop reading a pair of bytes from the input for each output byte, shifting the resulting word into place based on the source bit-offset, and writing back the upper or lower byte. In addition the final output byte is masked based on the length.
Below is my (poorly-tested) attempt at an implementation:
void extract_bitfield(unsigned char *dstptr, const unsigned char *srcptr, size_t bitpos, size_t bitlen) {
// Skip to the source byte covering the first bit of the range
srcptr += bitpos / CHAR_BIT;
// Similarly work out the expected, inclusive, final output byte
unsigned char *endptr = &dstptr[bitlen / CHAR_BIT];
// Truncate the bit-positions to offsets within a byte
bitpos %= CHAR_BIT;
bitlen %= CHAR_BIT;
// Scan through and write out a correctly shifted version of every destination byte
// via an intermediate shifter register
unsigned long accum = *srcptr++;
while(dstptr <= endptr) {
accum = accum << CHAR_BIT | *srcptr++;
*dstptr++ = accum << bitpos >> CHAR_BIT;
}
// Mask out the unwanted LSB bits not covered by the length
*endptr &= ~(UCHAR_MAX >> bitlen);
}
Beware that the code above may read past the end of the source buffer and somewhat messy special handling is required if you can't set up the overhead to allow this. It also assumes sizeof(long) != 1.
Of course to get efficiency out of this you will want to use as wide of a native word as possible. However if the target buffer necessarily word-aligned then things get even messier. Furthermore little-endian systems will need byte swizzling fix-ups.
Another subtlety to take heed of is the potential inability to shift a whole word, that is shift counts are frequently interpreted modulo the word length.
Anyway, happy bit-hacking!
Basically it's still a bunch of shift and addition operations.
I'll use a slightly larger example to demonstrate this.
Suppose we are give an input of 4 characters, and x = 10, n = 18.
00101011 10001001 10101110 01011100
First we need to locate the character contains our first bit, by x / 8, which gives us 1 (the second character) in this case. We also need the offset in that character, by x % 8, which equals to 2.
Now we can get out first character of the solution in three operations.
Left shift the second character 10001001 with 2 bits, gives us 00100100.
Right shift the third character 10101110 with 6 (comes from 8 - 2) bits, gives us 00000010.
Add these two characters gives us the first character in your return string, gives 00100110.
Loop this routine for n / 8 rounds. And if n % 8 is not 0, extract that many bits from the next character, you can do it in many approaches.
So in this example, our second round will give us 10111001, and the last step we get 10, then pad the rest bits with 0s.
I have the following hash algorithm:
unsigned long specialNum=0x4E67C6A7;
unsigned int ch;
char inputVal[]=" AAPB2GXG";
for(int i=0;i<strlen(inputVal);i++)
{
ch=inputVal[i];
ch=ch+(specialNum*32);
ch=ch+(specialNum/4);
specialNum=bitXor(specialNum,ch);
}
unsigned int outputVal=specialNum;
The bitXor simply does the Xor operation:
int bitXor(int a,int b)
{
return (a & ~b) | (~a & b);
}
Now I want to find an Algorithm that can generate an "inputVal" when the outputVal is given.(The generated inputVal may not be necessarily be same as the original inputVal.That's why I want to find collision).
This means that I need to find an algorithm that generates a solution that when fed into the above algorithm results same as specified "outputVal".
The length of solution to be generated should be less than or equal to 32.
Method 1: Brute force. Not a big deal, because your "specialNum" is always in the range of an int, so after trying on average a few billion input values, you find the right one. Should be done in a few seconds.
Method 2: Brute force, but clever.
Consider the specialNum value before the last ch is processed. You first calculate (specialNum * 32) + (specialNum / 4) + ch. Since -128 <= ch < 128 or 0 <= ch < 256 depending on the signedness of char, you know the highest 23 bits of the result, independent of ch. After xor'ing ch with specialNum, you also know the highest 23 bits (if ch is signed, there are two possible values for the highest 23 bits). You check whether those 23 bits match the desired output, and if they don't, you have excluded all 256 values of ch in one go. So the brute force method will end on average after 16 million steps.
Now consider the specialNum value before the last two ch are processed. Again, you can determine the highest possible 14 bits of the result (if ch is signed with four alternatives) without examining the last two characters at all. If the highest 14 bits don't match, you are done.
Method 3: This is how you do it. Consider in turn all strings s of length 0, 1, 2, etc. (however, your algorithm will most likely find a solution much quicker). Calculate specialNum after processing the string s. Following your algorithm, and allowing for char to be signed, find the up to 4 different values that the highest 14 bits of specialNum might have after processing two further characters. If any of those matches the desired output, then examine the value of specialNum after processing each of the 256 possible values of the next character, and find the up to 2 different values that the highest 23 bits of specialNum might have after examining another char. If one of those matches the highest 23 bits of the desired output then examine what specialNum would be after processing each of the 256 possible next characters and look for a match.
This should work below a millisecond. If char is unsigned, it is faster.
I have been following the msdn example that shows how to hash data using the Windows CryptoAPI. The example can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa382380%28v=vs.85%29.aspx
I have modified the code to use the SHA1 algorithm.
I don't understand how the code that displays the hash (shown below) in hexadecmial works, more specifically I don't understand what the >> 4 operator and the & 0xf operator do.
if (CryptGetHashParam(hHash, HP_HASHVAL, rgbHash, &cbHash, 0)){
printf("MD5 hash of file %s is: ", filename);
for (DWORD i = 0; i < cbHash; i++)
{
printf("%c%c", rgbDigits[rgbHash[i] >> 4],
rgbDigits[rgbHash[i] & 0xf]);
}
printf("\n");
}
I would be grateful if someone could explain this for me, thanks in advance :)
x >> 4 shifts x right four bits. x & 0xf does a bitwise and between x and 0xf. 0xf has its four least significant bits set, and all the other bits clear.
Assuming rgbHash is an array of unsigned char, this means the first expression retains only the four most significant bits and the second expression the four least significant bits of the (presumably) 8-bit input.
Four bits is exactly what will fit in one hexadecimal digit, so each of those is used to look up a hexadecimal digit in an array which presumably looks something like this:
char rgbDigits[] = "0123456789abcdef"; // or possibly upper-case letters
this code uses simple bit 'filtering' techniques
">> 4" means shift right by 4 places, which in turn means 'divide by 16'
"& 0xf" equals to bit AND operation which means 'take first 4 bits'
Both these values are passed to rgbDigits which proly produced output in valid range - human readable
I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).