Right, so I'm trying to keygen this program from crackmes.de
http://crackmes.de/users/lutio/keygenme1_by_lutio/
I'm not asking someone to do this for me, I'm just asking if its possible without brute forcing.
I normally don't have trouble with such things, but for some reason this user doesn't use any variables from the username, only the serial. He also says that brute forcing isn't allowed to solve his keygen. I believe that it is impossible to solve without brute forcing.
Does someone actually know if its possible to solve without bruteforcing? This is basically the code I have:
unsigned int key = 0x1FE339E4; //compute == this for success
unsigned int serialvar = unknown; //gennerated according to serial
unsigned int magic1 = 0x1FE339E7;
for( int i = 0; i < 0x10; i++ ) {
serialvar = (((magic1 * i + serialvar) << 0x10) ^ serialvar) + 0x13371337;
serialvar = (((i * magic1 + serialvar) >> 0x10) ^ serialvar) + 0x73317331;
}
unsigned int computed = 0;
for( int i = 0; i < serialvar; i++ ) {
computed += 0x3C;
}
Right, so at the end of the code, in order to succeed and hit the "goodboy" our unsigned int computed has to be equal to key.
As we know, the "serialvar" is the unknown variable that I have to generate. (Which I will then generate a serial based off of)
At my level of mathematics and boolean algebra, I believe that this equation is impossible to solve.
Now I'm not exactly sure if this math statement is correct... But if I take key and divide it by 0x3C... I don't get an integer!
Doesn't this sort of mean it is impossible to solve? Since key / 0x3C has no integer solution?
if I understand that correctly, you have the final value of computed and you want to know the value of unknown...
first of all you have to solve the equation:
computed = serialvar x 0x3C (mod 2^32)
there you may get multiple values of serialvar.
after that you have to reverse the 16 fold loop..
I don't see any simple way how to solve (((i * magic1 + serialvar) >> 0x10) ^ serialvar) and the other expression for serialvar.
The top 16 bits of serialvar are not changed because >> will make it xor against zeros.
bottom half may be computed by precalculated array. There will probably be multiple solutions (which is very probbable) and your computation will split.
The same with bottom half will hold for the second expression.
This way the total computation expenses will be similar to bruteforcing.
Related
Assume that I have:
unsigned int x = 883621;
which in binary is :
00000000000011010111101110100101
I need the fastest way to swap the two lowest bits:
00000000000011010111101110100110
Note: To clarify: If x is 7 (0b111), the output should be still 7.
If you have few bytes of memory to spare, I would start with a lookup table:
constexpr unsigned int table[]={0b00,0b10,0b01,0b11};
unsigned int func(unsigned int x){
auto y = (x & (~0b11)) |( table[x&0b11]);
return y;
}
Quickbench -O3 of all the answers so far.
Quickbench -Ofast of all the answers so far.
(Plus my ifelse naive idea.)
[Feel free to add yourself and edit my answer].
Please do correct me if you believe the benchmark is incorrect, I am not an expert in reading assembly. So hopefully volatile x prevented caching the result between loops.
I'll ignore the top bits for a second - there's a trick using multiplication. Multiplication is really a convolution operation, and you can use that to shuffle bits.
In particular, assume the two lower bits are AB. Multiply that by 0b0101, and you get ABAB. You'll see that the swapped bits BA are the middle bits.
Hence,
x = (x & ~3U) | ((((x&3)*5)>>1)&3)
[edit] The &3 is needed to strip the top A bit, but with std::uint_32_t you can use overflow to lose that bit for free - multiplication then gets you the result BAB0'0000'0000'0000'0000'0000'0000'0000'0000' :
x = (x & ~3U) | ((((x&3)*0xA0000000)>>30));
I would use
x = (x & ~0b11) | ((x & 0b10) >> 1) | ((x & 0b01) << 1);
Inspired by the table idea, but with the table as a simple constant instead of an array. We just need mask(00)==00, mask(01)==11, mask(10)=11, masK(11)==11.
constexpr unsigned int table = 0b00111100;
unsigned int func(unsigned int x) {
auto xormask = (table >> ((x&3) * 2)) &3;
x ^= xormask;
return x;
}
This also uses the xor-trick from dyungwang to avoid isolating the top bits.
Another idea, to avoid stripping the top bits. Assume x has the bits XXXXAB, then we want to x-or it with 0000(A^B)(A^B). Thus
auto t = x^(x>>1); // Last bit is now A^B
t &=1; // take just that bit
t *= 3; // Put in the last two positions
x ^= t; // Change A to B and B to A.
Just looking from a mathematical point of view, I would start with a rotate_left() function, which rotates a list of bits one place to the left (011 becomes 110, then 101, and then back 011), and use this as follows:
int func(int input){
return rotate_left(rotate_left((input / 4))) + rotate_left(input % 4);
}
Using this on the author's example 11010111101110100101:
input = 11010111101110100101;
input / 4 = 110101111011101001;
rotate_left(input / 4) = 1101011110111010010;
rotate_left(rotate_left(input / 4) = 11010111101110100100;
input % 4 = 01;
rotate_left(input % 4) = 10;
return 11010111101110100110;
There is also a shift() function, which can be used (twice!) for replacing the integer division.
I am reading about sets representing as bits at following location
http://www.brpreiss.com/books/opus4/html/page395.html
class SetAsBitVector : public Set
{
typedef unsigned int Word;
enum { wordBits = bitsizeof (Word) };
Array<Word> vector;
public:
SetAsBitVector (unsigned int);
// ...
};
SetAsBitVector::SetAsBitVector (unsigned int n) :
Set (n),
vector ((n + wordBits - 1U) / wordBits)
{
// Question here?
for (unsigned int i = 0; i < vector.Length (); ++i)
vector [i] = 0;
}
void SetAsBitVector::Insert (Object& object)
{
unsigned int const item = dynamic_cast<Element&> (object);
vector [item / wordBits] |= 1 << item % wordBits;
// Question here
}
To insert an item into the set, we need to change the appropriate bit
in the array of bits to one. The ith bit of the bit array is bit i mod
w of word ceiling(i/w). Thus, the Insert function is implemented using
a bitwise or operation to change the ith bit to one as shown in above
Program . Even though it is slightly more complicated than the
corresponding operation for the SetAsArray class, the running time for
this operation is still O(1). Since w = wordBits is a power of two, it
is possible to replace the division and modulo operations, / and %,
with shifts and masks like this:
vector [item >> shift] |= 1 << (item & mask);
Depending on the compiler and machine architecture, doing so may
improve the performance of the Insert operation by a constant factor
Questions
My question in constructor why author adding wordBits to "n" and subtracting 1, instead we can use directly as n/wordbits?
Second question whay does author mean by statement "ince w = wordBits is a power of two, it is possible to replace the division and modulo operations, / and %, with shifts and masks like this:
vector [item >> shift] |= 1 << (item & mask);
Reequest to give an example in case of above scenario what is value of shift and mask.
Why author mentioned depending on architecture and compiler there is improve in performance?
I re-tagged this as C++, since it's clearly not C.
To round up. Consider what happens if you call it with n equal to something smaller than wordBits for instance. The generic formula is exactly the one being used, i.e. b = (a + Q - 1) / Q makes sure b * Q is at least a.
Basic binary arithmmetic, division by two is equivalent with shifting to the right and so on.
On some machines, bitwise operations like shifts and masks are faster than divisions and modulos.
So I can't figure out how to do this in C++. I need to do a modulus operation and integer conversion on data that is 96 bits in length.
Example:
struct Hash96bit
{
char x[12];
};
int main()
{
Hash96bit n;
// set n to something
int size = 23;
int result = n % size
}
Edit: I'm trying to have a 96 bit hash because i have 3 floats which when combined create a unique combination. Thought that would be best to use as the hash because you don't really have to process it at all.
Edit: Okay... so at this point I might as well explain the bigger issue. I have a 3D world that I want to subdivide into sectors, that way groups of objects can be placed in sectors that would make frustum culling and physics iterations take less time. So at the begging lets say you are at sector 0,0,0. Sure we store them all in array, cool, but what happens when we get far away from 0,0,0? We don't care about those sectors there anymore. So we use a hashmap since memory isn't an issue and because we will be accessing data with sector values rather than handles. Now a sector is 3 floats, hashing that could easily be done with any number of algorithms. I thought it might be better if I could just say the 3 floats together is the key and go from there, I just needed a way to mod a 96 bit number to fit it in the data segment. Anyway I think i'm just gonna take the bottom bits of each of these floats and use a 64 bit hash unless anyone comes up with something brilliant. Thank you for the advice so far.
UPDATE: Having just read your second edit to the question, I'd recommend you use David's jenkin's approach (which I upvoted a while back)... just point it at the lowest byte in your struct of three floats.
Regarding "Anyway I think i'm just gonna take the bottom bits of each of these floats" - again, the idea with a hash function used by a hash table is not just to map each bit in the input (less till some subset of them) to a bit in the hash output. You could easily end up with a lot of collisions that way, especially if the number of buckets is not a prime number. For example, if you take 21 bits from each float, and the number of buckets happens to be 1024 currently, then after % 1024 only 10 bits from one of the floats will be used with no regard to the values of the other floats... hash(a,b,c) == hash(d,e,c) for all c (it's actually a little worse than that - values like 5.5, 2.75 etc. will only use a couple bits of the mantissa....).
Since you're insisting on this (though it's very likely not what you need, and a misnomer to boot):
struct Hash96bit
{
union {
float f[3];
char x[12];
uint32_t u[3];
};
Hash96bit(float a, float b, float c)
{
f[0] = a;
f[1] = b;
f[2] = c;
}
// the operator will support your "int result = n % size;" usage...
operator uint128_t() const
{
return u[0] * ((uint128_t)1 << 64) + // arbitrary ordering
u[1] + ((uint128_t)1 << 32) +
u[2];
}
};
You can use jenkins hash.
uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
{
uint32_t hash, i;
for(hash = i = 0; i < len; ++i)
{
hash += key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
I am trying to convert a binary array to decimal in following way:
uint8_t array[8] = {1,1,1,1,0,1,1,1} ;
int decimal = 0 ;
for(int i = 0 ; i < 8 ; i++)
decimal = (decimal << 1) + array[i] ;
Actually I have to convert 64 bit binary array to decimal and I have to do it for million times.
Can anybody help me, is there any faster way to do the above ? Or is the above one is nice ?
Your method is adequate, to call it nice I would just not mix bitwise operations and "mathematical" way of converting to decimal, i.e. use either
decimal = decimal << 1 | array[i];
or
decimal = decimal * 2 + array[i];
It is important, before attempting any optimisation, to profile the code. Time it, look at the code being generated, and optimise only when you understand what is going on.
And as already pointed out, the best optimisation is to not do something, but to make a higher level change that removes the need.
However...
Most changes you might want to trivially make here, are likely to be things the compiler has already done (a shift is the same as a multiply to the compiler). Some may actually prevent the compiler from making an optimisation (changing an add to an or will restrict the compiler - there are more ways to add numbers, and only you know that in this case the result will be the same).
Pointer arithmetic may be better, but the compiler is not stupid - it ought to already be producing decent code for dereferencing the array, so you need to check that you have not in fact made matters worse by introducing an additional variable.
In this case the loop count is well defined and limited, so unrolling probably makes sense.
Further more it depends on how dependent you want the result to be on your target architecture. If you want portability, it is hard(er) to optimise.
For example, the following produces better code here:
unsigned int x0 = *(unsigned int *)array;
unsigned int x1 = *(unsigned int *)(array+4);
int decimal = ((x0 * 0x8040201) >> 20) + ((x1 * 0x8040201) >> 24);
I could probably also roll a 64-bit version that did 8 bits at a time instead of 4.
But it is very definitely not portable code. I might use that locally if I knew what I was running on and I just wanted to crunch numbers quickly. But I probably wouldn't put it in production code. Certainly not without documenting what it did, and without the accompanying unit test that checks that it actually works.
The binary 'compression' can be generalized as a problem of weighted sum -- and for that there are some interesting techniques.
X mod (255) means essentially summing of all independent 8-bit numbers.
X mod 254 means summing each digit with a doubling weight, since 1 mod 254 = 1, 256 mod 254 = 2, 256*256 mod 254 = 2*2 = 4, etc.
If the encoding was big endian, then *(unsigned long long)array % 254 would produce a weighted sum (with truncated range of 0..253). Then removing the value with weight 2 and adding it manually would produce the correct result:
uint64_t a = *(uint64_t *)array;
return (a & ~256) % 254 + ((a>>9) & 2);
Other mechanism to get the weight is to premultiply each binary digit by 255 and masking the correct bit:
uint64_t a = (*(uint64_t *)array * 255) & 0x0102040810204080ULL; // little endian
uint64_t a = (*(uint64_t *)array * 255) & 0x8040201008040201ULL; // big endian
In both cases one can then take the remainder of 255 (and correct now with weight 1):
return (a & 0x00ffffffffffffff) % 255 + (a>>56); // little endian, or
return (a & ~1) % 255 + (a&1);
For the sceptical mind: I actually did profile the modulus version to be (slightly) faster than iteration on x64.
To continue from the answer of JasonD, parallel bit selection can be iteratively utilized.
But first expressing the equation in full form would help the compiler to remove the artificial dependency created by the iterative approach using accumulation:
ret = ((a[0]<<7) | (a[1]<<6) | (a[2]<<5) | (a[3]<<4) |
(a[4]<<3) | (a[5]<<2) | (a[6]<<1) | (a[7]<<0));
vs.
HI=*(uint32_t)array, LO=*(uint32_t)&array[4];
LO |= (HI<<4); // The HI dword has a weight 16 relative to Lo bytes
LO |= (LO>>14); // High word has 4x weight compared to low word
LO |= (LO>>9); // high byte has 2x weight compared to lower byte
return LO & 255;
One more interesting technique would be to utilize crc32 as a compression function; then it just happens that the result would be LookUpTable[crc32(array) & 255]; as there is no collision with this given small subset of 256 distinct arrays. However to apply that, one has already chosen the road of even less portability and could as well end up using SSE intrinsics.
You could use accumulate, with a doubling and adding binary operation:
int doubleSumAndAdd(const int& sum, const int& next) {
return (sum * 2) + next;
}
int decimal = accumulate(array, array+ARRAY_SIZE,
doubleSumAndAdd);
This produces big-endian integers, whereas OP code produces little-endian.
Try this, I converted a binary digit of up to 1020 bits
#include <sstream>
#include <string>
#include <math.h>
#include <iostream>
using namespace std;
long binary_decimal(string num) /* Function to convert binary to dec */
{
long dec = 0, n = 1, exp = 0;
string bin = num;
if(bin.length() > 1020){
cout << "Binary Digit too large" << endl;
}
else {
for(int i = bin.length() - 1; i > -1; i--)
{
n = pow(2,exp++);
if(bin.at(i) == '1')
dec += n;
}
}
return dec;
}
Theoretically this method will work for a binary digit of infinate length
I have to find log of very large number.
I do this in C++
I have already made a function of multiplication, addition, subtraction, division, but there were problems with the logarithm. I do not need code, I need a simple idea how to do it using these functions.
Thanks.
P.S.
Sorry, i forgot to tell you: i have to find only binary logarithm of that number
P.S.-2
I found in Wikipedia:
int floorLog2(unsigned int n) {
if (n == 0)
return -1;
int pos = 0;
if (n >= (1 <<16)) { n >>= 16; pos += 16; }
if (n >= (1 << 8)) { n >>= 8; pos += 8; }
if (n >= (1 << 4)) { n >>= 4; pos += 4; }
if (n >= (1 << 2)) { n >>= 2; pos += 2; }
if (n >= (1 << 1)) { pos += 1; }
return pos;
}
if I remade it under the big numbers, it will work correctly?
I assume you're writing a bignum class of your own. If you only care about an integral result of log2, it's quite easy. Take the log of the most significant digit that's not zero, and add 8 for each byte after that one. This is assuming that each byte holds values 0-255. These are only accurate within ±.5, but very fast.
[0][42][53] (10805 in bytes)
log2(42) = 5
+ 8*1 = 8 (because of the one byte lower than MSB)
= 13 (Actual: 13.39941145)
If your values hold base 10 digits, that works out to log2(MSB)+3.32192809*num_digits_less_than_MSB.
[0][5][7][6][2] (5762)
log2(5) = 2.321928095
+ 3.32192809*3 = 9.96578427 (because 3 digits lower than MSB)
= 12.28771 (Actual: 12.49235395)
(only accurate for numbers with less than ~10 million digits)
If you used the algorithm you found on wikipedia, it will be IMMENSELY slow. (but accurate if you need decimals)
It's been pointed out that my method is inaccurate when the MSB is small (still within ±.5, but no farther), but this is easily fixed by simply shifting the top two bytes into a single number, taking the log of that, and doing the multiplication for the bytes less than that number. I believe this will be accurate within half a percent, and still significantly faster than a normal logarithm.
[1][42][53] (76341 in bytes)
log2(1*256+42) = ?
log2(298) = 8.21916852046
+ 8*1 = 8 (because of the one byte lower than MSB)
= 16.21916852046 (Actual: 16.2201704643)
For base 10 digits, it's log2( [mostSignificantDigit]*10+[secondMostSignifcantDigit] ) + 3.32192809*[remainingDigitCount].
If performance is still an issue, you can use lookup tables for the log2 instead of using a full logarithm function.
I assume you want to know how to compute the logarithm "by hand". So I tell you what I've found for this.
Have a look over here, where it is described how to logarithmize by hand. You can implement this as an algorithm. Here's an article by "How Euler did it". I also find this article promising.
I suppose there are more sophisticated methods to do this, but they are so involved you probably don't want to implement them.