Where is this encryption/decryption algorithm going wrong? - c++

I've been working on a basic string encryption/decryption algorithm in C++ (the source is here: http://pastebin.com/MLnn8D82)
The problem I'm having is that it doesn't decrypt properly. The encryption equation is:
strInput[nPos]=(((strInput[nPos])+(nPos+1))*2);
And the decryption equation is:
strPassword[nPos]=(((strPassword[nPos])-(nPos+1))/2);
When I try it with just addition/subtraction operators, it works perfectly. But when I multiply in encryption and divide in decryption, I get a seemingly random string outputted.
At first I thought it may be because the password is written to and retrieved from a file before being decrypted, but I tried outputting it directly from the main function and I ended up with the same results.
Is there a problem with dividing/multiplying strings? It worked before with C-style (char array) strings, but I guess this could be different.
Any help is appreciated!
Edit: Thanks for the answers so far. I know that this isn't secure and that I shouldn't use it; I'm only doing it for practice.
Also, it's not a memory problem. I've tried dividing in the encryption stage rather than multiplying, but I still get a random string rather than the original string.

It's quite likely your multiplication is overflowing for some characters, meaning your division will never be able to recover the original.
On a side note, why are you writing the encryption algorithm yourself? If you're going to be using it for anything real, rather than just learning, you would be much better off using a library written by cryptography experts that is known to be secure. Something like Keyczar would be a good idea because it's designed to be difficult to get wrong (which is very easy to do in ways that are very subtle when it comes to cryptography).

There are multiple things wrong with this algorithm:
This is just a basic change to a standard Vigenère Cipher, which is well known to be very insecure. Do not use it for anything more than writing letters to a girlfriend, which other students should not be able to read. Even a somewhat decent math teacher will be able to decipher it easily.
Do not ever try to invent a cryptographic algorithm, unless you have a doctorate in number theory or cryptography. Even with a degree in one of these fields, writing a cryptographic algorithm, which is fairly secure, is a very hard task. And even if you find an algorithm, do not try to implement it yourself, but rather try to find an implementation which is already available. There is a lot you can get wrong, as can be seen by the various security flaws, which were cause by badly implemented cryptographic algorithms.
You do not have any support for a passphrase in your algorithm. This means, anybody who knows the algorithm can easily decipher your encrypted data. Usually a cryptographic algorithm takes a passphrase as an input, which is then used to decipher the data. This way the algorithm can be made public and only the passphrase must be kept secret. If the algorithm is kept secret, this is considered a fatal flaw by the cryptographic community.
Your multiplication might overflow, in case it yields a result, which is bigger than what could be stored in a char. In that case a division will not be able to retrieve the original data. This has been pointed out by others as well.
The order of operation is wrong. In your encryption step you add first then you multiply. Have a look at the resultion equation. Solving that equation for the input means you also have to change the order. In your case this means, you first have to divide and then you have to subtract. However in your code you are first subtracting and then dividing.
These are all the things I can tell you for now. This is not meant to discourage you from trying out this kind of stuff. I wrote a fair amount of similar algorithms when I was much younger. You just need to be very aware, that they will not be very secure.

There are two issues here.
One appears to stem from the use of strings and the input/output streams. If you set a breakpoint and step through this you'll realize that in the fRetrieve function the values of strPassword[nPos] can be negative. You are essentially reading in binary data into a string and trying to act on it.
What you should be doing is processing your strings into a binary data buffer. Such as a char array. That solely stores bytes. Then in your decryption you will get purely binary data back and can convert that into a string. This will insure the integrity of your data when writing/reading from the file. Playing with strings and high ASCII values is asking for the data to be interpreted wrong.
Second, is that your decryption algorithm is not properly reversed. So even if you did decrypt it correctly you would be off by 1 every time. This is an order of operations issue.
Example, assume an A (65) and nPos of 0. Encrypt:
65 + (0+1) = 66 * 2 = 132
Then reverse:
132 - (0+1) = 131 / 2 = 65.5
This may be rounded or truncated since it's an integer data type. The proper reverse is
(strPassword[nPos] / 2) - (nPos+1)

Related

Looking for hash algorithm where small change in input will result in small change on hash

Current hash functions are designed to have big changes on hash even if only a very small portion of input is changed. What I need, is a hash algorithm which output mutation will be directly proportional to input mutation. For example, I need something similar to this:
Hash("STR1") => 1000
Hash("STR2") => 1001
Hash("STR3") => 1002
etc.
I'm not good at algorithms, but never heared of such implementation, although I'm almost sure someone should already come up with this algorithm.
My current requirement is to have large bitrate (512 bits maybe?) to avoid collisions.
Thanks
UPDATE
I think I should clarify my goal, I see that I did a very poor job explaining what I need. Sorry, I'm not a native English speaker and great communicator.
So basically I need this hash algorithm for searching similar binary files. You can think of it as Antivirus hashing algorithm. It calculates file checksum, but unlike traditional hashing functions, even after some small modification in malware binary, it still is able to detect it. This is pretty much what I'm looking for.
Another aspect is to avoid collision. Let me explain what I mean by that. It's not a conflicting goal. I want Hash("STR1") to produce 1000 and Hash("STR2") to produce 1001 or 1010 maybe, doesn't matter as long as the value is close relative to previous hash. But Hash("This is a very large string or maybe even binary data" + 100 random chars) should not produce a value close to 1000. I understand it will not work always and there would be some hash/hash-range collisions, but I think I can introduce another hashing algorithm and verify both to minimize collisions.
So what do you think? Maybe there is a better way to achieve my goal, maybe I'm asking too much, I don't know. I'm not well versed in Cryptograpy, math or algorithms.
Thank you again for your time and effort
How about a simple summation? Your hash can then wrap at the desired size, and if you take this into account during hash comparisons, a small difference in inputs should yield a small difference in hashes.
However, I think "minimal collisions" and "proportional change in output" are conflicting goals.
This is called, in other domains, perceptual hashing.
One approach to this is as follows:
Get a training multiset of n-grams. (E.g. if n=2 and your training data was "This is a test" your training set would be "Th", "hi", "is", "s ", etc)
Sort and calculate the frequencies of said n-grams, decending.
Then the hash of a word is the first bits of "for each n-gram in the database, is this word's frequency said n-gram higher than the average frequency?"
Note that this can and will result in many collisions with similar words, unfortunately, unless the hash length is absurdly long.
MD5 or SHA-x is not what you want.
According to wikipedia, for example the substitution cipher has no avalanche effect (this is the word you mean).
In terms of hashing you could use some kind of a figure total.
For example:
char* hashme = "hallo123";
int result=0;
for(int i = 0; i<8; ++i) {
result += hashme[i];
}
It may be geared towards kids, but the old NSA Kid's section has some really good ideas.
Of course, these algorithms are really insecure, so you cannot use this in place of REAL encryption. (But you can't use a real encryption algorithm when you just want to have fun, either.)
The number grid involves setting up a grid, then using the coordinates of each letter:
Further ideas:
Mix up the letter arangement
Convert numbers to binary to obfuscate
A winding way also uses a grid. Essentially, the letters are packed in the grid left to right, in rows downwards. The output is produced by slicing vertically through the grid:
Typically hash and encryption algorithms oriented towards cryptography will behave in the exact opposite way of what you're looking for (i.e. small changes in the input will cause large changes in the output and vice versa), so this algorithm class is a dead end.
As a quick digression on why these algorithms behave like this: of necessity, they're designed to obscure statistical relationships between the input and output to make them more difficult to crack. For example, in the English language the letter "e" is by far the most commonly-used letter; in some very weak classical ciphers you could simply find the most common letter and figure that that corresponds to "e" (e.g. - if n is the most common letter, then odds are n = e). Actually, a statistical pattern like you describe would likely make the algorithm significantly more vulnerable to chosen-plaintext, known-plaintext, man in the middle, and replay attacks.
The man in the middle and replay attacks would be made significantly easier by the fact that it would be much easier to edit the ciphertext to achieve the desired plaintext without knowing the key (especially if you have access to a couple of chosen plaintexts).
If you know that
7/19/2016 1:35 transfer $10 from account x to account y
(where the datestamp is used to defend against a replay attack) encodes to
12345678910
whereas
7/19/2016 1:40 transfer $10 from account x to account y
encodes to
12445678910
it's a pretty safe guess that
12545678910
will mean something like
7/19/2016 1:45 transfer $10 from account x to account y
Without having access to the original key, you could replay this packet on a regular basis to continue to steal money from someone's account simply by making a trivial edit. Granted, this is a fairly contrived example, but it still illustrates the basic problem.
My understanding of what you're looking for is statistical similarity between files. This might help some: https://en.wikipedia.org/wiki/Semantic_similarity
This does indeed exist. The term is Locality-sensitive hashing. A concrete implementation can be found here.
Depending on the source document you might want to look at digital forensics or VisualRank (from google) for finding similar images and video. For textual data this is commonly used in anti-spam (read more here). For binary files you might want to first run disassembler and then run the algorithm on the text version - but this is just my feeling, I don't have a research to back this statement but it would be an interesting hypothesis to test.

Is there a library that would produce a string that would hash (SHA1) to a given input?

I'm wondering if it's possible to find a block of text that would hash to a known value. In particular, I'm looking for a function CreateDataFromHash() that could be called as follows:
unsigned char myHash[] = "da39a3ee5e6b4b0d3255bfef95601890afd80709";
unsigned int length = 10000;
CreateDataFromHash(myHash, length);
Here CreateDataFromHash would return the string of the length 10000 containing arbitrary data, which would hash to myHash using SHA1.
Thanks.
There's no known easy or even moderately difficult way to do this in general.
The entire point of hashes (or so-called one-way functions), is that it's easy to compute them, but next to impossible to reverse their computation (find input values based on output). That said, for some hash functions, there are known methods that may allow computing inputs for a given hash value in reasonable time.
For example, this MD5 sum technique will find collisions (but not input for a given output) in about 8 hours on a 1.6GHz computer.
For SHA-1 in particular you may be interested in reading this.
One of the purposes of SHA1 is that this should be very hard to do.
hashing is a one way function. you can't get input from the output.
This would be a "preimage attack". No such thing is publicly known against SHA-1.
The only attack known against SHA-1 is a collision attack. That means I find two inputs that produce the same result, but neither of them is pre-ordained, so to speak. Even so, this attack isn't really feasible for most people -- based on the amount of computation involved, the closest I can figure is that you'd have to spend somewhere in the range of a few million dollars to build a machine that would give you about one colliding pair of keys per week (assuming it ran, doing nothing else 24/7).
You have to brute force it. See
PHP brute force password generator
Get string, do hash, compare, repeat

Two-way "Hashing" of string

I want to generate int from a string and be able to generate it back.
Something like hash function but two-way function.
I want to use ints as ID in my application, but want to be able to convert it back in case of logging or debugging.
Like:
int id = IDProvider::getHash("NameOfMyObject");
object * a = createObject(id);
...
if(error)
{
LOG(IDProvider::getOriginalString(a->getId()), "some message");
}
I have heard of slightly modified CRC32 to be fast and 100% reversible, but I can not find it and I am not able to write it by myself.
Any hints what should I use?
Thank you!
edit
I have just founded the source I have the whole CRC32 thing from:
Jason Gregory : Game Engine Architecture
quotation:
"As with any hashing system, collisions are a possibility (i.e., two different strings might end up with the same hash code). However, with a suitable hash function, we can all but guarantee that collisions will not occur for all reasonable input strings we might use in our game. After all, a 32-bit hash chode represents more than four billion possible values. So if our hash function does a good job of distributing strings evently throughout this very large range, we are unlikely to collide. At Naughty Dog, we used a variant of the CRC-32 algorithm to hash our strings, and we didn't encounter a single collision in over two years of development on Uncharted: Drake's Fortune."
Reducing an arbitrary length string to a fixed size int is mathematically impossible to reverse. See Pidgeonhole principle. There is a near infinite amount of strings, but only 2^32 32 bit integers.
32 bit hashes(assuming your int is 32 bit) can have collisions very easily. So it's not a good unique ID either.
There are hashfunctions which allow you to create a message with a predefined hash, but it most likely won't be the original message. This is called a pre-image.
For your problem it looks like the best idea is creating a dictionary that maps integer-ids to strings and back.
To get the likelyhood of a collision when you hash n strings check out the birthday paradox. The most important property in that context is that collisions become likely once the number of hashed messages approaches the squareroot of the number of available hash values. So with a 32 bit integer collisions become likely if you hash around 65000 strings. But if you're unlucky it can happen much earlier.
I have exactly what you need. It is called a "pointer". In this system, the "pointer" is always unique, and can always be used to recover the string. It can "point" to any string of any length. As a bonus, it also has the same size as your int. You can obtain a "pointer" to a string by using the & operand, as shown in my example code:
#include <string>
int main() {
std::string s = "Hai!";
std::string* ptr = &s; // this is a pointer
std::string copy = *ptr; // this retrieves the original string
std::cout << copy; // prints "Hai!"
}
What you need is encryption. Hashing is by definition one way. You might try simple XOR Encryption with some addition/subtraction of values.
Reversible hash function?
How come MD5 hash values are not reversible?
checksum/hash function with reversible property
http://groups.google.com/group/sci.crypt.research/browse_thread/thread/ffca2f5ac3093255
... and many more via google search...
You could look at perfect hashing
http://en.wikipedia.org/wiki/Perfect_hash_function
It only works when all the potential strings are known up front. In practice what you enable by this, is to create a limited-range 'hash' mapping that you can reverse-lookup.
In general, the [hash code + hash algorithm] are never enough to get the original value back. However, with a perfect hash, collisions are by definition ruled out, so if the source domain (list of values) is known, you can get the source value back.
gperf is a well-known, age old program to generate perfect hashes in c/c++ code. Many more do exist (see the Wikipedia page)
Is it not possible. Hashing is not-returnable function - by definition.
As everyone mentioned, it is not possible to have a "reversible hash". However, there are alternatives (like encryption).
Another one is to zip/unzip your string using any lossless algorithm.
That's a simple, fully reversible method, with no possible collision.

AES Limitations and MixColumns

So we all agree keys are a fixed-length of 128bits or 192bits or 256bits. If our context was 50 characters in size (bytes) % 16 = 2 bytes. So we encrypt the context in 3 times, but the remaining two bytes how will they be stored in the State block. Should I pad them, the standard doesn't specify how to handle such conditions.
MixColumns stage is the most complicated aspect in the AES, however I have been unable to understand the mathematical representation. I have an understanding of the matrix multiplication, but I'm surprised of the mathematical results. Multiplying a value by 2, shift left for little endian 1 position and shift right for big endian. If we had the most significant bit was set as 1 (0x80) then we should XOR the shifted result with 0x1B. I thought by multiplying by 3 it would mean to shift the value 2 positions.
I've checked the various sources on Wikipedia, even the tutorial that provides a C implementation. But I'm more interested to complete my own implementation! Thank you for any possible input.
In the mix columns stage the exponents are being multiplied.
take this example
AA*3
10101010*00000011
is
x^7+x^5+x^3+x^1*x^1+x^0
x^1+x^0 is 3 represented in polynomial form
x^7+x^5+x^3+x^1 is AA represented in polynomial form
first take x^1 and dot multiply it by the polynomial for AA.
that results in...
x^8+x^6+x^4+x^2 ... adding one to each exponent
then reduce this to 8 bits by XoRing by 11B
11B is x^8+x^4+x^3+x^1+x^0 in polynomial form.
so...
x^8+x6+x^4+ x^2
x^8+ x^4+x^3+ x^1+x^0
leaves
x^6+x^3+x^2+x^1+x^0 which is AA*2
now take AA and dot multiply by x^0 (basically AA*1)
that gives you
x^7+x^5+x^3+x^1 ... a duplicate of the original value.
then exclusive or AA*2 with AA*1
x^7+ x^5+x^3+ x^1
x^6+ x^3+x^2+x^1+x^0
which leaves
x^7+x^6+x^5+x^2+x^0 or 11100101 or E5
I hope that helps.
here also is a document detailing the specifics of how mix columns works.
mix_columns.pdf
EDIT:Normal matrix multiplication does not apply to this ..so forget about normal matrices.
In response to your questions:
If you want to encrypt a stream of bytes using AES, do not just break it into individual blocks and encrypt them individually. This is not cryptographically secure and a clever attacker can recover a lot of information from your original plaintext. This is called an electronic code book and if you follow the link and see what happens when you use it to encrypt Tux the Linux Penguin you can visually see its insecurities. Instead, consider using a known secure technique like cipher-block chaining (CBC) or counter mode (CTR). These are a bit more complex to implement, but it's well worth the effort so that you can ensure a clever attacker can't break your encryption indirectly.
As for how the MixColumns stage works, I really don't understand much of the operation myself. It's based on a construction that involves fields of polynomials. If I can find a good explanation as to how it works, I'll let you know.
If you want to implement AES to further your understanding, that's perfectly fine and I encourage you to do so (though you are probably better off reading the mathematical intuition as to where the algorithm comes from). However, you should not use your own implementation for any actual cryptographic purposes. Without extreme care, you will render your implementation vulnerable to a side-channel attack that can compromise its security. The most famous example of this involves RSA encryption, in which without careful planning an attacker can actually watch the power draw of the computer as it does the encryption to recover the bits of the key. If you want to use AES to do encryption, consider using a known, tested, open-source implementation of the algorithm.
Hope this helps!
If you want to test the outcome of your own implementation (any internal state during computation) you can check this page :
http://www.keymolen.com/aes.jsp
It displays all internal states for any given plaintext, key and iv, also for the mixcolumns stage.

Decryption with AES and CryptoAPI? When you know the KEY/SALT

Okay so i have a packed a proprietary binary format. That is basically a loose packing of several different raster datasets. Anyways in the past just reading this and unpacking was an easy task. But now in the next version the raster xml data is now to be encrypted using AES-256(Not my choice nor do we have a choice).
Now we basically were sent the AES Key along with the SALT they are using so we can modify our unpackager.
NOTE THESE ARE NOT THE KEYS JUST AN EXAMPLE:
They are each 63 byte long ASCII characters:
Key: "QS;x||COdn'YQ#vs-`X\/xf}6T7Fe)[qnr^U*HkLv(yF~n~E23DwA5^#-YK|]v."
Salt: "|$-3C]IWo%g6,!K~FvL0Fy`1s&N<|1fg24Eg#{)lO=o;xXY6o%ux42AvB][j#/&"
We basically want to use the C++ CryptoAPI to decrypt this(I also am the only programmer here this week, and this is going live tomorrow. Not our fault). I've looked around for a simple tutorial of implementing this. Unfortunately i cannot even find a tutorial where they have both the salt and key separately. Basically all i have really right now is a small function that takes in an array of BYTE. Along with its length. How can i do this?
I've spent most of the morning trying to make heads/tails of the cryptoAPI. But its not going well period :(
EDIT
So i asked for how they encrypt it. They use C#, and use RijndaelManaged, which from my knowledge is not equivalent to AES.
EDIT2
Okay finally got exactly what was going on, and they sent us the wrong keys.
They are doing the following:
Padding = PKCS7
CipherMode = CBC
The Key is defined as a set of 32 Bytes in hex.
The IV is defined as a set of 32 bytes in hex too.
They took away the salt when i asked them.
How hard is it to set these things in CryptoAPI using the wincrypt.h header file.?
AES-256 uses 256 bit keys. Ideally, each key in your system should be equally likely. A 63 byte string would be 504 bits. You first need to figure out how the string of 63 characters needs to be converted to 256 bits (The sample ones you gave are not base64 encoded). Next, "salt" isn't an intrinsic part of AES. You might be referring to either an initialization vector (IV) in Cipher-Block-Chaining mode or you could be referring to somehow updating the key.
If I were to guess, I'm assuming that by "SALT" you mean IV and specifically CBC mode.
You will need to know all of this when using CAPI functions (e.g. decrypt).
If all of this sounds confusing, then it might be best to change your design so that you don't have to worry about getting all of this right. Crypto is hard. One bad step could invalidate all the security. Consider looking at this comment on my Stick Figure Guide to AES.
UPDATE: You can look at this for a rough starting point for C++ CAPI. You'll need a 64 character hex string to get 256 bits ( 256 bits / (4 bits / char) == 64 chars). You can convert the chars to bits yourself.
Again, I must caution that playing fast and loose with IV's and keys can have disastrous consequences. I've studied AES/Rijndael in depth down to the math and gate level and have even written my own implementation. However, in my production code, I stick to using a well-tested TLS implementation if at all possible for data in transit. Even for data at rest, it'd be better to use a higher level library.
Rijndael is the algorithm name for AES