Verify OpenPGP based RSA signature with WinCrypt/CryptoAPI - c++

I have code that parses OpenPGP packets and I have n, e of the public key packet as well as s of the signature packet as byte arrays.
In order to verify a signature I first initialize CryptAcquireContext (I also tried with PROV_RSA_FULL instead of PROV_RSA_AES)
HCRYPTPROV hCryptProv;
CryptAcquireContext(&hCryptProv, nullptr, nullptr, PROV_RSA_AES, CRYPT_VERIFYCONTEXT);
then create a hash
HCRYPTHASH hHash;
CryptCreateHash(hCryptProv, CALG_SHA1, 0, 0, &hHash); // as the digest algorithm of the signature was 2 => SHA1
and populate it using CryptHashData. This works so far as well as parsing and importing the public key using CryptImportKey.
typedef struct _RSAKEY
{
BLOBHEADER blobheader;
RSAPUBKEY rsapubkey;
BYTE n[4096 / 8];
} RSAKEY;
static int verify_signature_rsa(HCRYPTPROV hCryptProv, HCRYPTHASH hHash, public_key_t &p_pkey, signature_packet_t &p_sig)
{
int i_n_len = mpi_len(p_pkey.key.sig.rsa.n); // = 512; p_pkey.key.sig.rsa.n is of type uint8_t n[2 + 4096 / 8];
int i_s_len = mpi_len(p_sig.algo_specific.rsa.s); // = 256; p_sig.algo_specific.rsa.s is of type uint8_t s[2 + 4096 / 8]
HCRYPTKEY hPubKey;
RSAKEY rsakey;
rsakey.blobheader.bType = PUBLICKEYBLOB; // 0x06
rsakey.blobheader.bVersion = CUR_BLOB_VERSION; // 0x02
rsakey.blobheader.reserved = 0;
rsakey.blobheader.aiKeyAlg = CALG_RSA_KEYX;
rsakey.rsapubkey.magic = 0x31415352;// ASCII for RSA1
rsakey.rsapubkey.bitlen = i_n_len * 8; // = 4096
rsakey.rsapubkey.pubexp = 65537;
memcpy(rsakey.n, p_pkey.key.sig.rsa.n + 2, i_n_len); // skip first two byte which are MPI length
std::reverse(rsakey.n, rsakey.n + i_n_len); // need to convert to little endian for WinCrypt
CryptImportKey(hCryptProv, (BYTE*)&rsakey, sizeof(BLOBHEADER) + sizeof(RSAPUBKEY) + i_n_len, 0, 0, &hPubKey); // no error
std::unique_ptr<BYTE[]> pSig(new BYTE[i_s_len]);
memcpy(pSig.get(), p_sig.algo_specific.rsa.s + 2, i_s_len); // skip first two byte which are MPI length
std::reverse(p_sig.algo_specific.rsa.s, p_sig.algo_specific.rsa.s + i_s_len); // need to convert to little endian for WinCrypt
if (!CryptVerifySignature(hHash, pSig.get(), i_s_len, hPubKey, nullptr, 0))
{
DWORD err = GetLastError(); // err=2148073478 -> INVALID_SIGNATURE
CryptDestroyKey(hPubKey);
return -1;
}
CryptDestroyKey(hPubKey);
return 0;
}
CryptVerifySignature fails with GetLastError() decoding to INVALID_SIGNATURE.
On https://www.rfc-editor.org/rfc/rfc4880#section-5.2.2 I read
With RSA signatures, the hash value is encoded using PKCS#1 encoding
type EMSA-PKCS1-v1_5 as described in Section 9.2 of RFC 3447. This
requires inserting the hash value as an octet string into an ASN.1
structure.
Is that needed or is that automatically done by CryptVerifySignature? If not, how to do that?

The PKCS#1 padding is not likely to be the problem. The hint that it uses an OID for the hash algorithm by default is pointing to PKCS#1 v1.5 type of signatures, so I think you can rest assured that the right padding is used.
More confirmation can be found in the CryptSignHash documentation:
By default, the Microsoft RSA providers use the PKCS #1 padding method for the signature. The hash OID in the DigestInfo element of the signature is automatically set to the algorithm OID associated with the hash object. Using the CRYPT_NOHASHOID flag will cause this OID to be omitted from the signature.
Looking through the API documentation, the following caught my eye:
The native cryptography API uses little-endian byte order while the .NET Framework API uses big-endian byte order. If you are verifying a signature generated by using a .NET Framework API, you must swap the order of signature bytes before calling the CryptVerifySignature function to verify the signature.
This does mean that the API is not PKCS#1 v1.5 compliant as the byte order is explicitly specified therein. This is therefore certainly something to be aware of and could be part of a solution.

The error was in this line
std::reverse(p_sig.algo_specific.rsa.s, p_sig.algo_specific.rsa.s + i_s_len); // need to convert to little endian for WinCrypt
which should read
std::reverse(pSig.get(), pSig.get() + i_s_len); // need to convert to little endian for WinCrypt
because converting the source of the bytes from big to little endian does not convert another buffer after a copy.

Related

AES ECB known-text attack

I'm trying to perform known-text attack to obtain 32 byte key.
BlockSize is 16 byte.
Regarding this: https://crypto.stackexchange.com/a/12512
Or this: https://security.stackexchange.com/a/102110
As far as I understood:
1) Encrypt known 15 byte block
2) Encrypt known 256 16 byte blocks with different tailing byte
3) Compare blocks and get one byte of secret
void test() {
unsigned char KnownText[15];
memset(KnownText, 'A', 15);
unsigned char EncryptedText[32];
int result_size = AES_ECB.EncryptBlock(EncryptedText, KnownText, 15);
unsigned char CKnownText[16];
for (int i = 0; i < 256; ++i) {
memset(CKnownText, 'A', 16);
CKnownText[15] = i;
unsigned char Encrypted[32];
int enc_result = AES_ECB.EncryptBlock(Encrypted, CKnownText, 16);
if(memcmp(EncryptedText, Encrypted, 16) == 0) {
//match found
}
}
}
I get only one match when i=0 (suppose because 0 was appended to first 15 byte block) and it is not even any of secret key bytes.
I can encrypt any length of any known data and get encrypted result.
How can I get the key using this attack?
EncryptBlock probably does what it says is does: encrypt one block. The idea of the 15 byte first message is that you then concatenate the secret key block to it. I don't see where this happens (unless EncryptBlock is terribly badly named).
Currently the 16th byte that is encrypted is likely simply set to zero (using zero padding) by the EncryptBlock function. You may need to create a function that mimics what the server should do, including adding the server's secret to the initial message and possibly handle encrypting multiple blocks (assuming that the function doesn't already do this).
Note that this is not about retrieving the key from the block cipher, but retrieving a secret from the plaintext. This secret could have been added as some ill attempt to perform message authentication.

OpenSSL signature length is different for a different message

I have been struggling with a weird problem with RSA_verify. I am trying to RSA_sign using C and RSA_verify using C++. I have generated the private key and certificate using OpenSSL commands.
message = "1.2.0:08:00:27:2c:88:77"
When I use the message above, generate a hash and use RSA_sign to sign the digest, I get a signature of length 256 (strlen(signature)) and also the length returned from RSA_sign is 256. I use this length to verify and verification succeeds.
But when I use a message = "1.2.0:08:00:27:2c:88:08", the signature length is 60 and RSA_sign returns 256. When I use this length 60 to verify it fails. It fails to verify with length 256 as well. Also for some messages (1.2.0:08:00:27:2c:88:12) the signature generated is zero.
I am using SHA256 to hash the message and NID_SHA256 to RSA_sign and RSA_verify this digest. I have used -sha256 while generating the keys using the OpenSSL command.
I am forming the message by parsing an XML file reading some of the tags using some string operation.
Kindly suggest.
Below is the code used to sign.
int main(void)
{
int ret;
RSA *prikey;
char *data ;
unsigned char* signature;
int slen = 0;
FILE * fp_priv = NULL;
char* privfilepath = "priv.pem";
unsigned char* sign = NULL;
ERR_load_crypto_strings();
data = generate_hash();
printf("Message after generate hash %s: %d\n", data, strlen(data));
fp_priv = fopen(privfilepath, "r");
if (fp_priv == NULL)
{
printf("Private key path not found..");
return 1;
}
prikey = RSA_new();
prikey = PEM_read_RSAPrivateKey(fp_priv, &prikey, NULL, NULL);
if (prikey == NULL)
{
printf("Private key returned is NULL\n");
return 1;
}
signature = (unsigned char*)malloc(RSA_size(prikey));
if( signature == NULL )
return 1;
if(RSA_sign(NID_sha256, (unsigned char*)data, strlen(data),
signature, &slen, prikey) != 1) {
ERR_print_errors_fp(stdout);
return 1;
}
printf("Signature length while signing... %d : %d : %d ",
strlen(signature), slen, strlen(data));
FILE * sig_bin = fopen("sig_bin", "w");
fprintf(sig_bin, "%s", signature);
fclose(sig_bin);
system("xxd -p -c256 sig_bin sig_hex");
RSA_free(prikey);
if(signature)
free(signature);
return 0;
}
One very, very important thing to learn about C is it has two distinct types with the same name.
char*: This represents the beginning of a character string. You can do things like strstr or strlen.
You should never strstr or strlen, but rather strnstr and strnlen, but that's a different problem.
char*: This represents the beginning of a data blob (aka byte array, aka octet string), you can't meaningfully apply strlen/etc to it.
RSA_sign uses the latter. It returns "data", not "a message". So, in your snippet
printf("Signature length while signing... %d : %d : %d ",
strlen(signature), slen, strlen(data));
FILE * sig_bin = fopen("sig_bin", "w");
fprintf(sig_bin, "%s", signature);
fclose(sig_bin);
data came from a function called generate_hash(); it's probably non-textual, so strlen doesn't apply. signature definitely is data, so strlen doesn't apply. fprintf also doesn't apply, for the same reasons. These functions identify the end of the character string by the first occurrence of a zero-byte (0x00, '\0', etc). But 0x00 is perfectly legal to have in a signature, or a hash, or lots of "data".
The length of the output of RSA_sign is written into the address passed into the 5th parameter. You passed &slen (address-of slen), so once the function exits (successfully) slen is the length of the signature. Note that it will only very rarely match strlen(signature).
To write your signature as binary, you should use fwrite, such as fwrite(sig_bin, sizeof(char), signature, slen);. If you want it as text, you should Base-64 encode your data.

Duplicate Windows Cryptographic Service Provider results in Python w/ Pycrypto

Edits and Updates
3/24/2013:
My output hash from Python is now matching the hash from c++ after converting to utf-16 and stoping before hitting any 'e' or 'm' bytes. However the decrypted results do not match. I know that my SHA1 hash is 20 bytes = 160 bits and RC4 keys can vary in length from 40 to 2048 bits so perhaps there is some default salting going on in WinCrypt that I will need to mimic. CryptGetKeyParam KP_LENGTH or KP_SALT
3/24/2013:
CryptGetKeyParam KP_LENGTH is telling me that my key ength is 128bits. I'm feeding it a 160 bit hash. So perhaps it's just discarding the last 32 bits...or 4 bytes. Testing now.
3/24/2013:
Yep, that was it. If I discard the last 4 bytes of my SHA1 hash in python...I get the same decryption results.
Quick Info:
I have a c++ program to decrypt a datablock. It uses the Windows Crytographic Service Provider so it only works on Windows. I would like it to work with other platforms.
Method Overview:
In Windows Crypto API
An ASCII encode password of bytes is converted to a wide character representation and then hashed with SHA1 to make a key for an RC4 stream cipher.
In Python PyCrypto
An ASCII encoded byte string is decoded to a python string. It is truncated based on empircally obsesrved bytes which cause mbctowcs to stop converting in c++. This truncated string is then enocoded in utf-16, effectively padding it with 0x00 bytes between the characters. This new truncated, padded byte string is passed to a SHA1 hash and the first 128 bits of the digest are passed to a PyCrypto RC4 object.
Problem [SOLVED]
I can't seem to get the same results with Python 3.x w/ PyCrypto
C++ Code Skeleton:
HCRYPTPROV hProv = 0x00;
HCRYPTHASH hHash = 0x00;
HCRYPTKEY hKey = 0x00;
wchar_t sBuf[256] = {0};
CryptAcquireContextW(&hProv, L"FileContainer", L"Microsoft Enhanced RSA and AES Cryptographic Provider", 0x18u, 0);
CryptCreateHash(hProv, 0x8004u, 0, 0, &hHash);
//0x8004u is SHA1 flag
int len = mbstowcs(sBuf, iRec->desc, sizeof(sBuf));
//iRec is my "Record" class
//iRec->desc is 33 bytes within header of my encrypted file
//this will be used to create the hash key. (So this is the password)
CryptHashData(hHash, (const BYTE*)sBuf, len, 0);
CryptDeriveKey(hProv, 0x6801, hHash, 0, &hKey);
DWORD dataLen = iRec->compLen;
//iRec->compLen is the length of encrypted datablock
//it's also compressed that's why it's called compLen
CryptDecrypt(hKey, 0, 0, 0, (BYTE*)iRec->decrypt, &dataLen);
// iRec is my record that i'm decrypting
// iRec->decrypt is where I store the decrypted data
//&dataLen is how long the encrypted data block is.
//I get this from file header info
Python Code Skeleton:
from Crypto.Cipher import ARC4
from Crypto.Hash import SHA
#this is the Decipher method from my record class
def Decipher(self):
#get string representation of 33byte password
key_string= self.desc.decode('ASCII')
#so far, these characters fail, possibly others but
#for now I will make it a list
stop_chars = ['e','m']
#slice off anything beyond where mbstowcs will stop
for char in stop_chars:
wc_stop = key_string.find(char)
if wc_stop != -1:
#slice operation
key_string = key_string[:wc_stop]
#make "wide character"
#this is equivalent to padding bytes with 0x00
#Slice off the two byte "Byte Order Mark" 0xff 0xfe
wc_byte_string = key_string.encode('utf-16')[2:]
#slice off the trailing 0x00
wc_byte_string = wc_byte_string[:len(wc_byte_string)-1]
#hash the "wchar" byte string
#this is the equivalent to sBuf in c++ code above
#as determined by writing sBuf to file in tests
my_key = SHA.new(wc_byte_string).digest()
#create a PyCrypto cipher object
RC4_Cipher = ARC4.new(my_key[:16])
#store the decrypted data..these results NOW MATCH
self.decrypt = RC4_Cipher.decrypt(self.datablock)
Suspected [EDIT: Confirmed] Causes
1. mbstowcs conversion of the password resulted in the "original data" being fed to the SHA1 hash was not the same in python and c++. mbstowcs was stopping conversion at 0x65 and 0x6D bytes. Original data ended with a wide_char encoding of only part of the original 33 byte password.
RC4 can have variable length keys. In the Enhanced Win Crypt Sevice provider, the default length is 128 bits. Leaving the key length unspecified was taking the first 128 bits of the 160 bit SHA1 digest of the "original data"
How I investigated
edit: based on my own experimenting and the suggestions of #RolandSmith, I now know that one of my problems was mbctowcs behaving in a way I wasn't expecting. It seems to stop writing to sBuf on "e" (0x65) and "m"(0x6d) (probably others). So the passoword "Monkey" in my description (Ascii encoded bytes), would look like "M o n k" in sBuf because mbstowcs stopped at the e, and placed 0x00 between the bytes based on the 2 byte wchar typedef on my system. I found this by writing the results of the conversion to a text file.
BYTE pbHash[256]; //buffer we will store the hash digest in
DWORD dwHashLen; //store the length of the hash
DWORD dwCount;
dwCount = sizeof(DWORD); //how big is a dword on this system?
//see above "len" is the return value from mbstowcs that tells how
//many multibyte characters were converted from the original
//iRec->desc an placed into sBuf. In some cases it's 3, 7, 9
//and always seems to stop on "e" or "m"
fstream outFile4("C:/desc_mbstowcs.txt", ios::out | ios::trunc | ios::binary);
outFile4.write((const CHAR*)sBuf, int(len));
outFile4.close();
//now get the hash size from CryptGetHashParam
//an get the acutal hash from the hash object hHash
//write it to a file.
if(CryptGetHashParam(hHash, HP_HASHSIZE, (BYTE *)&dwHashLen, &dwCount, 0)) {
if(CryptGetHashParam(hHash, 0x0002, pbHash, &dwHashLen,0)){
fstream outFile3("C:/test_hash.txt", ios::out | ios::trunc | ios::binary);
outFile3.write((const CHAR*)pbHash, int(dwHashLen));
outFile3.close();
}
}
References:
wide characters cause problems depending on environment definition
Difference in Windows Cryptography Service between VC++ 6.0 and VS 2008
convert a utf-8 to utf-16 string
Python - converting wide-char strings from a binary file to Python unicode strings
PyCrypto RC4 example
https://www.dlitz.net/software/pycrypto/api/current/Crypto.Cipher.ARC4-module.html
Hashing a string with Sha256
http://msdn.microsoft.com/en-us/library/windows/desktop/aa379916(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa375599(v=vs.85).aspx
You can test the size of wchar_t with a small test program (in C):
#include <stdio.h> /* for printf */
#include <stddef.h> /* for wchar_t */
int main(int argc, char *argv[]) {
printf("The size of wchar_t is %ld bytes.\n", sizeof(wchar_t));
return 0;
}
You could also use printf() calls in your C++ code to write e.g. iRec->desc and the result of the hash in sbuf to the screen if you can run the C++ program from a terminal. Otherwise use fprintf() to dump them to a file.
To better mimic the behavior of the C++ program, you could even use ctypes to call mbstowcs() in your Python code.
Edit: You wrote:
One problem is definitely with mbctowcs. It seems that it's transferring an unpredictable (to me) number of bytes into my buffer to be hashed.
Keep in mind that mbctowcs returns the number of wide characters converted. In other words, a 33 byte buffer in a multi-byte encoding
can contain anything from 5 (UTF-8 6-byte sequences) up to 33 characters depending on the encoding used.
Edit2: You are using 0 as the dwFlags parameter for CryptDeriveKey. According to its documentation, the upper 16 bits should contain the key length. You should check CryptDeriveKey's return value to see if the call succeeded.
Edit3: You could test mbctowcs in Python (I'm using IPython here.):
In [1]: from ctypes import *
In [2]: libc = CDLL('libc.so.7')
In [3]: monkey = c_char_p(u'Monkey')
In [4]: test = c_char_p(u'This is a test')
In [5]: wo = create_unicode_buffer(256)
In [6]: nref = c_size_t(250)
In [7]: libc.mbstowcs(wo, monkey, nref)
Out[7]: 6
In [8]: print wo.value
Monkey
In [9]: libc.mbstowcs(wo, test, nref)
Out[9]: 14
In [10]: print wo.value
This is a test
Note that in Windows you should probably use libc = cdll.msvcrt instead of libc = CDLL('libc.so.7').

OpenSSL decrypted text length

I am using this simple function for decrypting a AES Encrypted string
unsigned char *aes_decrypt(EVP_CIPHER_CTX *e, unsigned char *ciphertext, int *len)
{
int p_len = *len, f_len = 0;
unsigned char *plaintext = (unsigned char*)malloc(p_len + 128);
memset(plaintext,0,p_len);
EVP_DecryptInit_ex(e, NULL, NULL, NULL, NULL);
EVP_DecryptUpdate(e, plaintext, &p_len, ciphertext, *len);
EVP_DecryptFinal_ex(e, plaintext+p_len, &f_len);
*len = p_len + f_len;
return plaintext;
}
The problem is that len is returning a value that does not match the entire decoded string. What could be the problem ?
When you say "string", I assume you mean a zero-terminated textual string. The encryption process is dependent on a cipher block size, and oftentimes padding. What's actually being encoded and decoded is up to the application... it's all binary data to the cipher. If you're textual string is smaller than what's returned from the decrypt process, your application needs to determine the useful part. So for example if you KNOW your string inside the results is zero-terminated, you can get the length doing a simple strlen. That's risky of course if you can't guarantee the input... probably better off searching the results for a null up to the decoded length...
If you are using cipher in ECB, CBC or some other chaining modes, you must pad plain text to the length, which is multiple of cipher block length. You can see a PKCS#5 standard for example. High-level functions like in OpenSSL can perform padding transparently for programmer. So, encrypted text can be larger than plain text up to additional cipher block size.

Wincrypt: Unable to decrypt file which was encrypted in C#. NTE_BAD_DATA at CryptDecrypt

I am trying to decrypt a piece of a file with wincrypt and I cannot seem to make this function decrypt correctly. The bytes are encrypted with the RC2 implementation in C# and I am supplying the same password and IV to both the encryption and decryption process (encrypted in C#, decrypted in c++).
All of my functions along the way are returning true until the final "CryptDecrypt" function. Instead of me typing out any more, here is the function:
static char* DecryptMyFile(char *input, char *password, int size)
{
HCRYPTPROV provider = NULL;
if(CryptAcquireContext(&provider, NULL, MS_ENHANCED_PROV, PROV_RSA_FULL, 0))
{printf("Context acquired.");}
else
{
if (GetLastError() == NTE_BAD_KEYSET)
{
if(CryptAcquireContext(&provider, 0, NULL, PROV_RSA_FULL, CRYPT_NEWKEYSET))
{printf("new key made.");}
else
{
printf("Could not acquire context.");
}
}
else
{printf("Could not acquire context.");}
}
HCRYPTKEY key = NULL;
HCRYPTHASH hash = NULL;
if(CryptCreateHash(provider, CALG_MD5, 0, 0, &hash))
{printf("empty hash created.");}
else
{printf("could not create hash.");}
if(CryptHashData(hash, (BYTE *)password, strlen(password), 0))
{printf("data buffer is added to hash.");}
else
{printf("error. could not add data buffer to hash.");}
if(CryptDeriveKey(provider, CALG_RC2, hash, 0, &key))
{printf("key derived.");}
else
{printf("Could not derive key.");}
DWORD dwKeyLength = 128;
if(CryptSetKeyParam(key, KP_EFFECTIVE_KEYLEN, reinterpret_cast<BYTE*>(&dwKeyLength), 0))
{printf("success");}
else
{printf("failed.");}
BYTE IV[8] = {0,0,0,0,0,0,0,0};
if(CryptSetKeyParam(key, KP_IV, IV, 0))
{printf("worked");}
else
{printf("faileD");}
DWORD dwCount = size;
BYTE *decrypted = new BYTE[dwCount + 1];
memcpy(decrypted, input, dwCount);
decrypted[dwCount] = 0;
if(CryptDecrypt(key,0, true, 0, decrypted, &dwCount))
{printf("succeeded");}
else
{printf("failed");}
return (char *)decrypted;
}
input is the data passed to the function, encrypted. password is the same password used to encrypt the data in C#. size is the size of the data while encrypted.
All of the above functions return true until CryptDecrypt, which I cannot seem to figure out why. At the same time, I'm not sure how the CryptDecrypt function would possibly edit my "decrypted" variable, since I am not passing a reference of it.
Any help or advice onto why this is not working would be greatly appreciated. This is my first endeavour with wincrypt and first time using C++ in years.
If it is of any more help, as well, this is my encryption (in C#):
public static byte[] EncryptString(byte[] input, string password)
{
PasswordDeriveBytes pderiver = new PasswordDeriveBytes(password, null);
byte[] ivZeros = new byte[8];
byte[] pbeKey = pderiver.CryptDeriveKey("RC2", "MD5", 128, ivZeros);
RC2CryptoServiceProvider RC2 = new RC2CryptoServiceProvider();
//using an empty initialization vector for convenience.
byte[] IV = new byte[8];
ICryptoTransform encryptor = RC2.CreateEncryptor(pbeKey, IV);
MemoryStream msEncrypt = new MemoryStream();
CryptoStream csEncrypt = new CryptoStream(msEncrypt, encryptor, CryptoStreamMode.Write);
csEncrypt.Write(input, 0, input.Length);
csEncrypt.FlushFinalBlock();
return msEncrypt.ToArray();
}
I have confirmed that my hash value in C++ is identical to my key in C#, created by PasswordDeriveBytes.CryptDeriveKey
First, as in my comment, use GetLastError() so you know what it failed. I'll assume that you get NTE_BAD_DATA, all the other errors are much more easier to deal with since they basically mean you missed some step int he API call sequence.
The typical reason why CryptDecrypt would fail with NTE_BAD_DATA would be that you're decrypting the last block of a block cypher (as you are) and the decrypted padding bytes are incorrect. This can happen if the input is truncated (not all encrypted bytes were saved to the file) or if the key is incorrect.
I would suggest you take this methodically since there are so many places where this can fail that will all manifest only at CryptDecrypt time:
Ensure that the file you encrypt in C# can be decrypted in C#. This would eliminate any file save truncation issues.
Try to encrypt and decrypt with fixed hard codded key first (no password derived), this will ensure that your key set code IV initialization are correct (as well as padding mode and cypher chaining mode).
Ensure that the password derivation process arives at the same hash. Things like ANSI vs. Unicode or terminal 0 can wreak havok on the MD5 hash and result in wildly different keys from apparently the same password hash.
Some people have discovered issues when moving between operating systems.
The CryptDeriveKey call uses a "default key length" based on the operating system and algorithm chosen. For RC2, the default generated key length is 40 bits on Windows 2000 and 128 bits on Windows 2003. This results in a "BAD DATA" return code when the generated key is used in a CryptDecrypt call.
Presumably this is related to "garbage" appearing at the end of the final buffer after trying to apply a 128 bit key to decrypt a 40 bit encrypted stream. The error code typically indicates bad padding bytes - but the root cause may be a key generation issue.
To generate a 40 bit encryption key, use ( ( 40 <<16 ) ) in the flags field of the CryptDeriveKey call.