AES Limitations and MixColumns

AES Limitations and MixColumns - c++

So we all agree keys are a fixed-length of 128bits or 192bits or 256bits. If our context was 50 characters in size (bytes) % 16 = 2 bytes. So we encrypt the context in 3 times, but the remaining two bytes how will they be stored in the State block. Should I pad them, the standard doesn't specify how to handle such conditions.
MixColumns stage is the most complicated aspect in the AES, however I have been unable to understand the mathematical representation. I have an understanding of the matrix multiplication, but I'm surprised of the mathematical results. Multiplying a value by 2, shift left for little endian 1 position and shift right for big endian. If we had the most significant bit was set as 1 (0x80) then we should XOR the shifted result with 0x1B. I thought by multiplying by 3 it would mean to shift the value 2 positions.
I've checked the various sources on Wikipedia, even the tutorial that provides a C implementation. But I'm more interested to complete my own implementation! Thank you for any possible input.

In the mix columns stage the exponents are being multiplied.
take this example
AA*3
10101010*00000011
is
x^7+x^5+x^3+x^1*x^1+x^0
x^1+x^0 is 3 represented in polynomial form
x^7+x^5+x^3+x^1 is AA represented in polynomial form
first take x^1 and dot multiply it by the polynomial for AA.
that results in...
x^8+x^6+x^4+x^2 ... adding one to each exponent
then reduce this to 8 bits by XoRing by 11B
11B is x^8+x^4+x^3+x^1+x^0 in polynomial form.
so...
x^8+x6+x^4+ x^2
x^8+ x^4+x^3+ x^1+x^0
leaves
x^6+x^3+x^2+x^1+x^0 which is AA*2
now take AA and dot multiply by x^0 (basically AA*1)
that gives you
x^7+x^5+x^3+x^1 ... a duplicate of the original value.
then exclusive or AA*2 with AA*1
x^7+ x^5+x^3+ x^1
x^6+ x^3+x^2+x^1+x^0
which leaves
x^7+x^6+x^5+x^2+x^0 or 11100101 or E5
I hope that helps.
here also is a document detailing the specifics of how mix columns works.
mix_columns.pdf
EDIT:Normal matrix multiplication does not apply to this ..so forget about normal matrices.

In response to your questions:
If you want to encrypt a stream of bytes using AES, do not just break it into individual blocks and encrypt them individually. This is not cryptographically secure and a clever attacker can recover a lot of information from your original plaintext. This is called an electronic code book and if you follow the link and see what happens when you use it to encrypt Tux the Linux Penguin you can visually see its insecurities. Instead, consider using a known secure technique like cipher-block chaining (CBC) or counter mode (CTR). These are a bit more complex to implement, but it's well worth the effort so that you can ensure a clever attacker can't break your encryption indirectly.
As for how the MixColumns stage works, I really don't understand much of the operation myself. It's based on a construction that involves fields of polynomials. If I can find a good explanation as to how it works, I'll let you know.
If you want to implement AES to further your understanding, that's perfectly fine and I encourage you to do so (though you are probably better off reading the mathematical intuition as to where the algorithm comes from). However, you should not use your own implementation for any actual cryptographic purposes. Without extreme care, you will render your implementation vulnerable to a side-channel attack that can compromise its security. The most famous example of this involves RSA encryption, in which without careful planning an attacker can actually watch the power draw of the computer as it does the encryption to recover the bits of the key. If you want to use AES to do encryption, consider using a known, tested, open-source implementation of the algorithm.
Hope this helps!

If you want to test the outcome of your own implementation (any internal state during computation) you can check this page :
http://www.keymolen.com/aes.jsp
It displays all internal states for any given plaintext, key and iv, also for the mixcolumns stage.

Related

Using only part of crc32

Will using only 2 upper/lower bytes of a crc32 sum make it weaker than crc16?
Background:
I'm currently implementing a wireless protocol.
I have chunks of 64byte each, and according to
Data Length vs CRC Length
I would need at most crc16.
Using crc16 instead of crc32 would free up bandwidth for use in Forward error correction (64byte is one block in FEC).
However, my hardware is quite low powered but has hardware support for CRC32.
So my idea was to use the hardware crc32 engine and just throw away 2 of the result bytes.
I know that this is not a crc16 sum, but that does not matter because I control both sides of the transmission.
In case it matters: I can use both crc32 (poly 0x04C11DB7) or crc32c (poly 0x1EDC6F41).

Yes, it will be weaker, but only for small numbers of bit errors. You get none of the guarantees of a CRC-16 by instead taking half of a CRC-32. E.g. the number of bits in a burst that are always detectable.
What is the noise source that you are trying to protect against?

Looking for hash algorithm where small change in input will result in small change on hash

Current hash functions are designed to have big changes on hash even if only a very small portion of input is changed. What I need, is a hash algorithm which output mutation will be directly proportional to input mutation. For example, I need something similar to this:
Hash("STR1") => 1000
Hash("STR2") => 1001
Hash("STR3") => 1002
etc.
I'm not good at algorithms, but never heared of such implementation, although I'm almost sure someone should already come up with this algorithm.
My current requirement is to have large bitrate (512 bits maybe?) to avoid collisions.
Thanks
UPDATE
I think I should clarify my goal, I see that I did a very poor job explaining what I need. Sorry, I'm not a native English speaker and great communicator.
So basically I need this hash algorithm for searching similar binary files. You can think of it as Antivirus hashing algorithm. It calculates file checksum, but unlike traditional hashing functions, even after some small modification in malware binary, it still is able to detect it. This is pretty much what I'm looking for.
Another aspect is to avoid collision. Let me explain what I mean by that. It's not a conflicting goal. I want Hash("STR1") to produce 1000 and Hash("STR2") to produce 1001 or 1010 maybe, doesn't matter as long as the value is close relative to previous hash. But Hash("This is a very large string or maybe even binary data" + 100 random chars) should not produce a value close to 1000. I understand it will not work always and there would be some hash/hash-range collisions, but I think I can introduce another hashing algorithm and verify both to minimize collisions.
So what do you think? Maybe there is a better way to achieve my goal, maybe I'm asking too much, I don't know. I'm not well versed in Cryptograpy, math or algorithms.
Thank you again for your time and effort

How about a simple summation? Your hash can then wrap at the desired size, and if you take this into account during hash comparisons, a small difference in inputs should yield a small difference in hashes.
However, I think "minimal collisions" and "proportional change in output" are conflicting goals.

This is called, in other domains, perceptual hashing.
One approach to this is as follows:
Get a training multiset of n-grams. (E.g. if n=2 and your training data was "This is a test" your training set would be "Th", "hi", "is", "s ", etc)
Sort and calculate the frequencies of said n-grams, decending.
Then the hash of a word is the first bits of "for each n-gram in the database, is this word's frequency said n-gram higher than the average frequency?"
Note that this can and will result in many collisions with similar words, unfortunately, unless the hash length is absurdly long.

MD5 or SHA-x is not what you want.
According to wikipedia, for example the substitution cipher has no avalanche effect (this is the word you mean).
In terms of hashing you could use some kind of a figure total.
For example:
char* hashme = "hallo123";
int result=0;
for(int i = 0; i<8; ++i) {
result += hashme[i];
}

It may be geared towards kids, but the old NSA Kid's section has some really good ideas.
Of course, these algorithms are really insecure, so you cannot use this in place of REAL encryption. (But you can't use a real encryption algorithm when you just want to have fun, either.)
The number grid involves setting up a grid, then using the coordinates of each letter:
Further ideas:
Mix up the letter arangement
Convert numbers to binary to obfuscate
A winding way also uses a grid. Essentially, the letters are packed in the grid left to right, in rows downwards. The output is produced by slicing vertically through the grid:

Typically hash and encryption algorithms oriented towards cryptography will behave in the exact opposite way of what you're looking for (i.e. small changes in the input will cause large changes in the output and vice versa), so this algorithm class is a dead end.
As a quick digression on why these algorithms behave like this: of necessity, they're designed to obscure statistical relationships between the input and output to make them more difficult to crack. For example, in the English language the letter "e" is by far the most commonly-used letter; in some very weak classical ciphers you could simply find the most common letter and figure that that corresponds to "e" (e.g. - if n is the most common letter, then odds are n = e). Actually, a statistical pattern like you describe would likely make the algorithm significantly more vulnerable to chosen-plaintext, known-plaintext, man in the middle, and replay attacks.
The man in the middle and replay attacks would be made significantly easier by the fact that it would be much easier to edit the ciphertext to achieve the desired plaintext without knowing the key (especially if you have access to a couple of chosen plaintexts).
If you know that
7/19/2016 1:35 transfer $10 from account x to account y
(where the datestamp is used to defend against a replay attack) encodes to
12345678910
whereas
7/19/2016 1:40 transfer $10 from account x to account y
encodes to
12445678910
it's a pretty safe guess that
12545678910
will mean something like
7/19/2016 1:45 transfer $10 from account x to account y
Without having access to the original key, you could replay this packet on a regular basis to continue to steal money from someone's account simply by making a trivial edit. Granted, this is a fairly contrived example, but it still illustrates the basic problem.
My understanding of what you're looking for is statistical similarity between files. This might help some: https://en.wikipedia.org/wiki/Semantic_similarity

This does indeed exist. The term is Locality-sensitive hashing. A concrete implementation can be found here.
Depending on the source document you might want to look at digital forensics or VisualRank (from google) for finding similar images and video. For textual data this is commonly used in anti-spam (read more here). For binary files you might want to first run disassembler and then run the algorithm on the text version - but this is just my feeling, I don't have a research to back this statement but it would be an interesting hypothesis to test.

Using CRCs as a digest to detect duplicates among files

The primary use of CRCs and similar computations (such as Fletcher and Adler) seems to be for the detection of transmission errors. As such, most studies I have seen seem to address the issue of the probability of detecting small-scale differences between two data sets. My needs are slightly different.
What follows is a very approximate description of the problem. Details are much more complicated than this, but the description below illustrates the functionality I am looking for. This little disclaimer is intended to ward of answers such as "Why are you solving your problem this way when you can more easily solve it this other way I propose?" - I need to solve my problem this way for a myriad of reasons that are not germane to this question or post, so please don't post such answers.
I am dealing with collections of data sets (size ~1MB) on a distributed network. Computations are performed on these data sets, and speed/performance is critical. I want a mechanism to allow me to avoid re-transmitting data sets. That is, I need some way to generate a unique identifier (UID) for each data set of a given size. (Then, I transmit data set size and UID from one machine to another, and the receiving machine only needs to request transmission of the data if it does not already have it locally, based on the UID.)
This is similar to the difference between using CRC to check changes to a file, and using a CRC as a digest to detect duplicates among files. I have not seen any discussions of the latter use.
I am not concerned with issues of tampering, i.e. I do not need cryptographic strength hashing.
I am currently using a simple 32-bit CRC of the serialized data, and that has so far served me well. However, I would like to know if anyone can recommend which 32-bit CRC algorithm (i.e. which polynomial?) is best for minimizing the probability of collisions in this situation?
The other question I have is a bit more subtle. In my current implementation, I ignore the structure of my data set, and effectively just CRC the serialized string representing my data. However, for various reasons, I want to change my CRC methodology as follows. Suppose my top-level data set is a collection of some raw data and a few subordinate data sets. My current scheme essentially concatenates the raw data and all the subordinate data sets and then CRC's the result. However, most of the time I already have the CRC's of the subordinate data sets, and I would rather construct my UID of the top-level data set by concatenating the raw data with the CRC's of the subordinate data sets, and then CRC this construction. The question is, how does using this methodology affect the probability of collisions?
To put it in a language what will allow me to discuss my thoughts, I'll define a bit of notation. Call my top-level data set T, and suppose it consists of raw data set R and subordinate data sets Si, i=1..n. I can write this as T = (R, S1, S2, ..., Sn). If & represents concatenation of data sets, my original scheme can be thought of as:
UID_1(T) = CRC(R & S1 & S2 & ... & Sn)
and my new scheme can be thought of as
UID_2(T) = CRC(R & CRC(S1) & CRC(S2) & ... & CRC(Sn))
Then my questions are: (1) if T and T' are very different, what CRC algorithm minimizes prob( UID_1(T)=UID_1(T') ), and what CRC algorithm minimizes prob( UID_2(T)=UID_2(T') ), and how do these two probabilities compare?
My (naive and uninformed) thoughts on the matter are this. Suppose the differences between T and T' are in only one subordinate data set, WLOG say S1!=S1'. If it happens that CRC(S1)=CRC(S1'), then clearly we will have UID_2(T)=UID_2(T'). On the other hand, if CRC(S1)!=CRC(S1'), then the difference between R & CRC(S1) & CRC(S2) & ... & CRC(Sn) and R & CRC(S1') & CRC(S2) & ... & CRC(Sn) is a small difference on 4 bytes only, so the ability of UID_2 to detect differences is effectively the same as a CRC's ability to detect transmission errors, i.e. its ability to detect errors in only a few bits that are not widely separated. Since this is what CRC's are designed to do, I would think that UID_2 is pretty safe, so long as the CRC I am using is good at detecting transmission errors. To put it in terms of our notation,
prob( UID_2(T)=UID_2(T') ) = prob(CRC(S1)=CRC(S1')) + (1-prob(CRC(S1)=CRC(S1'))) * probability of CRC not detecting error a few bits.
Let call the probability of CRC not detecting an error of a few bits P, and the probability of it not detecting large differences on a large size data set Q. The above can be written approximately as
prob( UID_2(T)=UID_2(T') ) ~ Q + (1-Q)*P
Now I will change my UID a bit more as follows. For a "fundamental" piece of data, i.e. a data set T=(R) where R is just a double, integer, char, bool, etc., define UID_3(T)=(R). Then for a data set T consisting of a vector of subordinate data sets T = (S1, S2, ..., Sn), define
UID_3(T) = CRC(ID_3(S1) & ID_3(S2) & ... & ID_3(Sn))
Suppose a particular data set T has subordinate data sets nested m-levels deep, then, in some vague sense, I would think that
prob( UID_3(T)=UID_3(T') ) ~ 1 - (1-Q)(1-P)^m
Given these probabilities are small in any case, this can be approximated as
1 - (1-Q)(1-P)^m = Q + (1-Q)*P*m + (1-Q)*P*P*m*(m-1)/2 + ... ~ Q + m*P
So if I know my maximum nesting level m, and I know P and Q for various CRCs, what I want is to pick the CRC that gives me the minimum value for Q + m*P. If, as I suspect might be the case, P~Q, the above simplifies to this. My probability of error for UID_1 is P. My probability of error for UID_3 is (m+1)P, where m is my maximum nesting (recursion) level.
Does all this seem reasonable?

I want a mechanism to allow me to avoid re-transmitting data sets.
rsync has already solved this problem, using generally the approach you outline.
However, I would like to know if anyone can recommend which 32-bit CRC
algorithm (i.e. which polynomial?) is best for minimizing the
probability of collisions in this situation?
You won't see much difference among well-selected CRC polynomials. Speed may be more important to you, in which case you may want to use a hardware CRC, e.g. the crc32 instruction on modern Intel processors. That one uses the CRC-32C (Castagnoli) polynomial. You can make that really fast by using all three arithmetic units on a single core in parallel by computing the CRC on three buffers in the same loop, and then combining them. See below how to combine CRCs.
However, most of the time I already have the CRC's of the subordinate
data sets, and I would rather construct my UID of the top-level data
set by concatenating the raw data with the CRC's of the subordinate
data sets, and then CRC this construction.
Or you could quickly compute the CRC of the entire set as if you had done a CRC on the whole thing, but using the already calculated CRCs of the pieces. Look at crc32_combine() in zlib. That would be better than taking the CRC of a bunch of CRCs. By combining, you retain all the mathematical goodness of the CRC algorithm.

Mark Adler's answer was bang on. If I'd taken my programmers hat off and put on my mathematicians hat, some of it should have been obvious. He didn't have the time to explain the mathematics, so I will here for those who are interested.
The process of calculating a CRC is essentially the process of doing a polynomial division. The polynomials have coefficients mod 2, i.e. the coefficient of each term is either 0 or 1, hence a polynomial of degree N can be represented by an N-bit number, each bit being the coefficient of a term (and the process of doing a polynomial division amounts to doing a whole bunch of XOR and shift operations). When CRC'ing a data block, we view the "data" as one big polynomial, i.e. a long string of bits, each bit representing the coefficient of a term in the polynomial. Well call our data-block polynomial A. For each CRC "version", there has been chosen the polynomial for the CRC, which we'll call P. For 32-bit CRCs, P is a polynomial with degree 32, so it has 33 terms and 33 coefficients. Because the top coefficient is always 1, it is implicit and we can represent the 32nd-degree polynomial with a 32-bit integer. (Computationally, this is quite convenient actually.) The process of calculating the CRC for a data block A is the process of finding the remainder when A is divided by P. That is, A can always be written
A = Q * P + R
where R is a polynomial of degree less than degree of P, i.e. R has degree 31 or less, so it can be represented by a 32-bit integer. R is essentially the CRC. (Small note: typically one prepends 0xFFFFFFFF to A, but that is unimportant here.) Now, if we concatenate two data blocks A and B, the "polynomial" corresponding to the concatenation of the two blocks is the polynomial for A, "shifted to the left" by the number of bits in B, plus B. Put another way, the polynomial for A&B is A*S+B, where S is the polynomial corresponding to a 1 followed by N zeros, where N is the number of bits in B. (i.e. S = x**N ). Then, what can we say about the CRC for A&B? Suppose we know A=Q*P+R and B=Q'*P+R', i.e. R is the CRC for A and R' is the CRC for B. Suppose we also know S=q*P+r. Then
A * S + B = (Q*P+R)*(q*P+r) + (Q'*P+R')
= Q*(q*P+r)*P + R*q*P + R*r + Q'*P + R'
= (Q*S + R*q + Q') * P + R*r + R'
So to find the remainder when A*S+B is divided by P, we need only find the remainder when R*r+R' is divided by P. Thus, to calculate the CRC of the concatenation of two data streams A and B, we need only know the separate CRC's of the data streams, i.e. R and R', and the length N of the trailing data stream B (so we can compute r). This is also the content of one of Marks other comments: if the lengths of the trailing data streams B are constrained to a few values, we can pre-compute r for each of these lengths, making combination of two CRC's quite trivial. (For an arbitrary length N, computing r is not trivial, but it is much faster (log_2 N) than re-doing the division over the entire B.)
Note: the above is not a precise exposition of CRC. There is some shifting that goes on. To be precise, if L is the polynomial represented by 0xFFFFFFFF, i.e. L=x*31+x*30+...+x+1, and S_n is the "shift left by n bits" polynomial, i.e. S_n = x**n, then the CRC of a data block with polynomial A of N bits, is the remainder when ( L * S_N + A ) * S_32 is divided by P, i.e. when (L&A)*S_32 is divided by P, where & is the "concatenation" operator.
Also, I think I disagree with one of Marks comments, but he can correct me if I'm wrong. If we already know R and R', comparing the time to compute the CRC of A&B using the above methodology, as compared with computing it the straightforward way, does not depend on the ratio of len(A) to len(B) - to compute it the "straight forward" way, one really does not have to re-compute the CRC on the entire concatenated data set. Using our notation above, one only needs to compute the CRC of R*S+B. That is, instead of pre-pending 0xFFFFFFFF to B and computing its CRC, we prepend R to B and compute its CRC. So its a comparison of the time to compute B's CRC over again with the time to compute r, (followed by dividing R*r+R' by P, which is trivial and inconsequential in time likely).

Mark Adler's answer addresses the technical question so that's not what I'll do here. Here I'm going to point out a major potential flaw in the synchronization algorithm proposed in the OP's question and suggest a small improvement.
Checksums and hashes provide a single signature value for some data. However, being of finite length, the number of possible unique values of a checksum/hash is always smaller than the possible combinations of the raw data if the data is longer. For instance, a 4 byte CRC can only ever take on 4 294 967 296 unique values whilst even a 5 byte value which might be the data can take on 8 times as many values. This means for any data longer than the checksum itself, there always exists one or more byte combinations with exactly the same signature.
When used to check integrity, the assumption is that the likelihood of a slightly different stream of data resulting in the same signature is small so that we can assume the data is the same if the signature is the same. It is important to note that we start with some data d and verify that given a checksum, c, calculated using a checksum function, f that f(d) == c.
In the OP's algorithm, however, the different use introduces a subtle, detrimental degradation of confidence. In the OP's algorithm, server A would start with the raw data [d1A,d2A,d3A,d4A] and generate a set of checksums [c1,c2,c3,c4] (where dnA is the n-th data item on server A). Server B would then receive this list of checksums and check its own list of checksums to determine if any are missing. Say Server B has the list [c1,c2,c3,c5]. What should then happen is that it requests d4 from Server A and the synchronization has worked properly in the ideal case.
If we recall the possibilty of collisions, and that it doesn't always take that much data to produce one (e.g. CRC("plumless") == CRC("buckeroo")), then we'll quickly realize that the best guarantee our scheme provides is that server B definitely doesn't have d4A but it cannot guarantee that it has [d1A,d2A,d3A]. This is because it is possible that f(d1A) = c1 and f(d1B) = c1 even though d1A and d1B are distinct and we would like both servers to have both. In this scheme, neither server can ever know about the existence of both d1A and d1B. We can use more and more collision resistant checksums and hashes but this scheme can never guarantee complete synchronization. This becomes more important, the greater the number of files the network must keep track of. I would recommend using a cryptographic hash like SHA1 for which no collisions have been found.
A possible mitigation of the risk of this is to introduce redundant hashes. One way of doing is is to use a completely different algorithm since whilst it is possible crc32(d1) == crc32(d2) it is less likely that adler32(d1) == adler32(d2) simultaneously. This paper suggests you don't gain all that much this way though. To use the OP notation, it is also less likely that crc32('a' & d1) == crc32('a' & d2) and crc32('b' & d1) == crc32('b' & d2) are simultaneously true so you can "salt" to less collision prone combinations. However, I think you may just as well just use a collision resistant hash function like SHA512 which in practice likely won't have that great an impact on your performance.

Where is this encryption/decryption algorithm going wrong?

I've been working on a basic string encryption/decryption algorithm in C++ (the source is here: http://pastebin.com/MLnn8D82)
The problem I'm having is that it doesn't decrypt properly. The encryption equation is:
strInput[nPos]=(((strInput[nPos])+(nPos+1))*2);
And the decryption equation is:
strPassword[nPos]=(((strPassword[nPos])-(nPos+1))/2);
When I try it with just addition/subtraction operators, it works perfectly. But when I multiply in encryption and divide in decryption, I get a seemingly random string outputted.
At first I thought it may be because the password is written to and retrieved from a file before being decrypted, but I tried outputting it directly from the main function and I ended up with the same results.
Is there a problem with dividing/multiplying strings? It worked before with C-style (char array) strings, but I guess this could be different.
Any help is appreciated!
Edit: Thanks for the answers so far. I know that this isn't secure and that I shouldn't use it; I'm only doing it for practice.
Also, it's not a memory problem. I've tried dividing in the encryption stage rather than multiplying, but I still get a random string rather than the original string.

It's quite likely your multiplication is overflowing for some characters, meaning your division will never be able to recover the original.
On a side note, why are you writing the encryption algorithm yourself? If you're going to be using it for anything real, rather than just learning, you would be much better off using a library written by cryptography experts that is known to be secure. Something like Keyczar would be a good idea because it's designed to be difficult to get wrong (which is very easy to do in ways that are very subtle when it comes to cryptography).

There are multiple things wrong with this algorithm:
This is just a basic change to a standard Vigenère Cipher, which is well known to be very insecure. Do not use it for anything more than writing letters to a girlfriend, which other students should not be able to read. Even a somewhat decent math teacher will be able to decipher it easily.
Do not ever try to invent a cryptographic algorithm, unless you have a doctorate in number theory or cryptography. Even with a degree in one of these fields, writing a cryptographic algorithm, which is fairly secure, is a very hard task. And even if you find an algorithm, do not try to implement it yourself, but rather try to find an implementation which is already available. There is a lot you can get wrong, as can be seen by the various security flaws, which were cause by badly implemented cryptographic algorithms.
You do not have any support for a passphrase in your algorithm. This means, anybody who knows the algorithm can easily decipher your encrypted data. Usually a cryptographic algorithm takes a passphrase as an input, which is then used to decipher the data. This way the algorithm can be made public and only the passphrase must be kept secret. If the algorithm is kept secret, this is considered a fatal flaw by the cryptographic community.
Your multiplication might overflow, in case it yields a result, which is bigger than what could be stored in a char. In that case a division will not be able to retrieve the original data. This has been pointed out by others as well.
The order of operation is wrong. In your encryption step you add first then you multiply. Have a look at the resultion equation. Solving that equation for the input means you also have to change the order. In your case this means, you first have to divide and then you have to subtract. However in your code you are first subtracting and then dividing.
These are all the things I can tell you for now. This is not meant to discourage you from trying out this kind of stuff. I wrote a fair amount of similar algorithms when I was much younger. You just need to be very aware, that they will not be very secure.

There are two issues here.
One appears to stem from the use of strings and the input/output streams. If you set a breakpoint and step through this you'll realize that in the fRetrieve function the values of strPassword[nPos] can be negative. You are essentially reading in binary data into a string and trying to act on it.
What you should be doing is processing your strings into a binary data buffer. Such as a char array. That solely stores bytes. Then in your decryption you will get purely binary data back and can convert that into a string. This will insure the integrity of your data when writing/reading from the file. Playing with strings and high ASCII values is asking for the data to be interpreted wrong.
Second, is that your decryption algorithm is not properly reversed. So even if you did decrypt it correctly you would be off by 1 every time. This is an order of operations issue.
Example, assume an A (65) and nPos of 0. Encrypt:
65 + (0+1) = 66 * 2 = 132
Then reverse:
132 - (0+1) = 131 / 2 = 65.5
This may be rounded or truncated since it's an integer data type. The proper reverse is
(strPassword[nPos] / 2) - (nPos+1)

Decryption with AES and CryptoAPI? When you know the KEY/SALT

Okay so i have a packed a proprietary binary format. That is basically a loose packing of several different raster datasets. Anyways in the past just reading this and unpacking was an easy task. But now in the next version the raster xml data is now to be encrypted using AES-256(Not my choice nor do we have a choice).
Now we basically were sent the AES Key along with the SALT they are using so we can modify our unpackager.
NOTE THESE ARE NOT THE KEYS JUST AN EXAMPLE:
They are each 63 byte long ASCII characters:
Key: "QS;x||COdn'YQ#vs-`X\/xf}6T7Fe)[qnr^U*HkLv(yF~n~E23DwA5^#-YK|]v."
Salt: "|$-3C]IWo%g6,!K~FvL0Fy`1s&N<|1fg24Eg#{)lO=o;xXY6o%ux42AvB][j#/&"
We basically want to use the C++ CryptoAPI to decrypt this(I also am the only programmer here this week, and this is going live tomorrow. Not our fault). I've looked around for a simple tutorial of implementing this. Unfortunately i cannot even find a tutorial where they have both the salt and key separately. Basically all i have really right now is a small function that takes in an array of BYTE. Along with its length. How can i do this?
I've spent most of the morning trying to make heads/tails of the cryptoAPI. But its not going well period :(
EDIT
So i asked for how they encrypt it. They use C#, and use RijndaelManaged, which from my knowledge is not equivalent to AES.
EDIT2
Okay finally got exactly what was going on, and they sent us the wrong keys.
They are doing the following:
Padding = PKCS7
CipherMode = CBC
The Key is defined as a set of 32 Bytes in hex.
The IV is defined as a set of 32 bytes in hex too.
They took away the salt when i asked them.
How hard is it to set these things in CryptoAPI using the wincrypt.h header file.?

AES-256 uses 256 bit keys. Ideally, each key in your system should be equally likely. A 63 byte string would be 504 bits. You first need to figure out how the string of 63 characters needs to be converted to 256 bits (The sample ones you gave are not base64 encoded). Next, "salt" isn't an intrinsic part of AES. You might be referring to either an initialization vector (IV) in Cipher-Block-Chaining mode or you could be referring to somehow updating the key.
If I were to guess, I'm assuming that by "SALT" you mean IV and specifically CBC mode.
You will need to know all of this when using CAPI functions (e.g. decrypt).
If all of this sounds confusing, then it might be best to change your design so that you don't have to worry about getting all of this right. Crypto is hard. One bad step could invalidate all the security. Consider looking at this comment on my Stick Figure Guide to AES.
UPDATE: You can look at this for a rough starting point for C++ CAPI. You'll need a 64 character hex string to get 256 bits ( 256 bits / (4 bits / char) == 64 chars). You can convert the chars to bits yourself.
Again, I must caution that playing fast and loose with IV's and keys can have disastrous consequences. I've studied AES/Rijndael in depth down to the math and gate level and have even written my own implementation. However, in my production code, I stick to using a well-tested TLS implementation if at all possible for data in transit. Even for data at rest, it'd be better to use a higher level library.

Rijndael is the algorithm name for AES

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js