md5sum a file that contain the sum itself? - c++

I have written a small app in C++ consisting of a single EXE file.
I want to put in its "about" dialog the md5sum of the executable itself. It should be embedded statically into the executable (so that can be seen from hex editor), rather than computed on the fly.

As both #Shi & #matthewdaniel have already said, this can't be done directly.
However a couple of workarounds are possible:
Calculating MD5 of your application, and packaging your executable inside a container app that will simply extract it and check it's MD5
Compiling your code and hashing only the code segments or other segments (except the Data), and than adding the MD5 check code. This will work as the MD5 string will be stored in the Data segment keeping the validity of the precalculated hash of any other memory segment valid.

This is not possible.
If you enter the md5 hash into the binary, the binary will change, so the md5 hash changes as well. If you create a new one, and try to add it to the binary, the binary will change again.
So best is to put the hash into a file, and read that file and display its content.
Another way could be to create the md5 hash of the binary, and then append it to the executable. In order to fetch the value, you read the last 32 byte of the binary and display it as md5. Of course, if you create a hash of the complete executable, it won't match the hash - you have to create the hash of the executable excluding the last 32 byte.
If you store the 128 bit md5 hash in a raw format (base 256 instead of base 16), you only need 16 byte.

As soon as you add the md5 to the file the file will have a different md5. There is no way to get the md5 in the file itself.

The typical method is a signature. A signature is a hash that is further signed by a public/private key. The application can use the public key to verify the hash contained within.
However, this needs to be separate from the executable. As the other answers state, it is impossible to do this with one file. You can merge the signature and the binary and provide instructions to use tools to separate them to compute the verification.
However, this does not stop in-memory attacks against the application. Ie, you have a buffer overflow and an attacker can re-write code in memory.
You might not need the hash of the public key. You need to encrypt the hash of the binary so it can not be altered. You might use the hash of the public key to verify instructions to user, etc. The distribution of the public key and verification instruction can not be bundled. Otherwise an attacker can just re-create with an alternate key pair. Including the hash of the public key can prevent some other attack against instruction. Ie, the signature has some verification that the advertised public key is matching what the binary was signed with.
Using established methods is probably better as users can have alternate tools to verify the integrity. Also, this only makes the public key needed to distribute through some other channel.
Reference: Digital signature with OpenSSL
The above fixes another attack. Given that what you said was possible, what would stop someone else from doing the same thing, but with a trojan horse binary. Distribution of the public key is an authentication of the source (legitimate developer). None of the other answers addressed this.

Related

removing code from an exe file with hex editor possible? (c++)

so I was wondering if I could simply use some identification passages in my application to identify the origin of a copyright infringement (not yet implemented, just a thought). But then I figured, probably it's possible to simply cut the respective passages in my code or edit them to make identification impossible with the help of a hex editor or thelike. Is this possible? Let's assume for example I would put a hidden comment into the code which could be accessed in a certain secret way (e.g. by clicking somewhere). Now if someone possessed two program units (i.e. which were sold to two different people) would he be able to delete/edit the ''difference'' in a hex editor?
You can calculate a hash of (the important parts of) the executable, sign it cryptographically, and embed the hash and signature in the executable. If the executable is modified, the hash will change. If the hash is modified, the signature won't match.
If you'd prefer to prevent infringement, rather than just detecting it, then each time the executable runs, it can validate the hash and the signature, refusing to run if they've been modified.
To identify the source of an application you need to be able to uniquely identify the application.
This is usually done by providing each customer with a unique key that must be present for the application to run. On start-up the application checks the key is present and is valid.
You can prevent simple editing of the key by using cryptographic means of encoding the key. Thus modifying the key with a hex editor will not produce a new key but an invalid key. Just make the program refuse to run when there is an invalid key.

What method/algorithm/library can securely encrypt then decrypt

The following project is done in C++ with WinAPI, for encryption/encoding I am using CryptoC++ but I am open to better libraries. I need to encrypt/encode email data, transmit it, then decrypt it at the other end so privileged users can read the email.
My original idea was just to encrypt the email text using SHA256 using my key(eg "MYKEY"). But I think I don't fully understand what hashing is. I understand that a string encrypted with SHA256 or MD5 or AES is impossible to decrypt, BUT I thought that if I encrypt the string with my special key("MYKEY") that I could then decrypt it aslong as I know the special key. Is that correct?
If not can you suggest a library, algorithm or method I can use to achieve my task of encrypting/encoding email text & ONLY being able to decrypt it if I have a key or some shared secret that will allow me to decrypt the data?
As said by Captain Giraffe, a hash algorithm is not an encryption algorithm (though they are both counted in the area of symmetric cryptography). A good hash function has no way to recover a message which fits to the produced hash (other than trying all possible messages to see if they give the same hash). (And also, a hash function has fixed size output, but has a variable size input, which means that there are many messages giving the same hash. It still should be difficult finding even one pair of messages giving the same hash, or a message for a given hash.)
You need an encryption algorithm. Most probably asymmetric encryption (using public keys to encrypt, private keys to decrypt) is a good idea.
Don't invent new cryptographic data formats or protocols. You will make mistakes, which make your product insecure.
For email encryption, use either OpenPGP (RFC 4880) or S/MIME (RFC 3851), or some subsets of one of these.
You can then use any library which supports the necessary algorithms, or some library which supports specifically these file formats.
SHA256 and MD5 are One way functions. i.e. There is no decryption. See Hashing http://en.wikipedia.org/wiki/Cryptographic_hash_function.
But you really need to read up on encryption procedures before attempting to create a secure communication.
That being said wikipedia has an article dedicated to implementations http://en.wikipedia.org/wiki/AES_implementations

Advice about the Encryption Method I should Use

Ok, so I need some advice on which encryption method I should use for my current project. All the questions about this subject on here are to do with networking and passing encrypted data from one machine to another.
A brief summary of how the system works is:
I have some data that is held in tables that are in text format. I then use a tool to parse this data and serialize it to a dat file. This works fine but I need to encrypt this data as it will be stored with the application in a public place. The data wont be sent anywhere it is simply read by the application. I just need it to be encrypted so that if it were to fall into the wrong hands, it would not be possible to read the data.
I am using the crypto++ library for my encryption and I have read that it can perform most types of encryption algorithms. I have noticed however that most algorithms use a public and private key to encrypt/decrypt the data. This would mean I would have to store the private key with the data which seems counter intuitive to me. Are there any ways that I can perform the encryption without storing a private key with the data?
I see no reason to use asymmetric crypto in your case. I see two decent solutions depending on the availability of internet access:
Store the key on a server. Only if the user of the program logs in to the server he gets back the key to his local storage.
Use a Key-Derivation-Function such as PBKDF2 to derive the key from a password.
Of course all of this fails if the attacker is patient and installs a keylogger and waits until you access the files the next time. There is no way to secure your data once your machine has been compromised.
Short answer: don't bother.
Long answer: If you store your .DAT file with the application, you'll have to store the key somewhere too. Most probably in the same place (maybe hidden in the code). So if a malicious user wants to break your encryption all he has to do is to look for that key, and that's it. It doesn't really matter which method or algorithm you use. Even if you don't store the decryption key with the application, it will get there eventually, and the malicious user can catch it with the debugger at run time (unless you're using a dedicated secured memory chip and running on a device that has the necessary protections)
That said, many times the mere fact that the data is encrypted is enough protection because the data is just not worth the trouble. If this is your case - then you can just embed the key in the code and use any symmetric algorithm available (AES would be the best pick).
Common way to solve your issue is:
use symetric key algorithm to cipher your data, common algorithm are AES, twofish. most probably, you want to use CBC chaining.
use a digest (sha-256) and sign it with an asymetric algorithm (RSA), using your private key : this way you embed a signature and a public key to check it, making sure that if your scrambling key is compromised, other persons won't be able to forge your personal data. Of course, if you need to update these data, then you can't use this private key mechanism.
In any case, you should check
symetric cipher vs asymetric ones
signature vs ciphering
mode of operation, meaning how you chain one block to the next one for block ciphers, like AES, 3DES (CBC vs ECB)
As previously said, if your data is read andwritten by same application, in any way, it will be very hard to prevent malicious users to steal these data. There are ways to hide keys in the code (you can search for Whitebox cryptography), but it will be definitely fairly complex (and obviously not relying on a simple external crypto library which can be easily templated to steal the key).
If your application can read the data and people have access to that application, someone with enough motivation and time will eventually figure out (by disassembling your application) how to read the data.
In other words, all the information that is needed to decipher the encrypted data is already in the hand of the attacker. You have the consumer=attacker problem in all DRM-related designs and this is why people can easily decrypt DVDs, BluRays, M4As, encrypted eBooks, etc etc etc...
That is called an asymmetric encryption when you use public/private key pairs.
You could use a symmetric encryption algorithm, that way you would only require one key.
That key will still need to be stored somewhere (it could be in the executable). But if the user has access to the .dat, he probably also has access to the exe. Meaning he could still extract that information. But if he has access to the pc (and the needed rights) he could read all the information from memory anyways.
You could ask the user for a passphrase (aka password) and use that to encrypt symmetrically. This way you don't need to store the passphrase anywhere.

Is RIJNDAEL encryption safe to use with small amounts of text given to users?

I am thinking about making the switch to storing session data in encrypted cookies rather than somewhere on my server. While this will result in more bandwidth used for each request - it will save extra database server load and storage space.
Anyway, I plan on encrypting the cookie contents using RIJNDAEL 256.
function encrypt($text, $key)
{
return mcrypt_encrypt(MCRYPT_RIJNDAEL_256,$key,$text,MCRYPT_MODE_ECB,mcrypt_create_iv(mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256,MCRYPT_MODE_ECB),MCRYPT_RAND));
}
Which in use would produce something like this (base64 encoded for display)
print base64_encode(encrypt('text', 'key'));
7s6RyMaYd4yAibXZJ3C8EuBtB4F0qfJ31xu1tXm8Xvw=
I'm not worried about a single users cookie being compromised as much as I am worried that an attacker would discover the key and be able to construct any session for any user since they know what I use to sign the data.
Is there a way I can verify estimated cracking times in relation to the parameters used? Or is there a standard measure of time in relation to the size of the text or key used?
I heard someone say that the keys needed to exceed 256bits themselves to be safe enough to be used with RIJNDAEL. I'm also wondering if the length of the text encrypted needs to be a certain length so as not to give away the key.
The data will generally be about 200 characters
a:3{s:7:"user_id";i:345;s:5:"token";s:32:"0c4a14547ad221a5d877c2509b887ee6";s:4:"lang";s:2:"en";}
So is this safe?
Yes Rijndael(AES) is safe, however your implementation is far from safe. There are 2 outstanding issues with your implementation. The use of ECB mode and your IV is a static variable that will be used for all messages. An IV must always be a Cryptographic Nonce. Your code is in clear violation of CWE-329.
ECB mode should never be used, CBC mode must be used and this why:
Original:
Encrypted with ECB Mode:
Encrypted using CBC mode:
Avoid using ECB. It can reveal information about what's encrypted. Any two blocks with the same plaintext will have the same ciphertext. CBC would avoid this, but requires an IV to be generated or saved.
Avoid simply saving a key and IV. Generate a 256 bit master key using a cryptographically strong random number generator and save that into you application somewhere safe. Use that to generate session keys for use in encryption. The IV can be derived from the session key. When generating the session key include any and all available data that can be used to narrow the scope of the session key. (e.g. include the scope the cookie, the remote host address, a random nounce stored with the encrypted data, and/or a user ID if it isn't within the encrypted data)
Depending on how the data is to be used you may have to include a MAC. ECB and CBC are not designed to detect any changes to the ciphertext, and such changes will result in garbage in plaintext. You might want to include an HMAC with the encrypted data to allow you to authenticate it before taking it as canon. A session HMAC key must be derived from the session encryption key. Alternatively, you could use PCBC mode. PCBC was made to detect changes in the ciphertext, but its ability to do so is limited by the size of the padding, witch is dependent on the data that is encrypted, and not all crypto APIs will have it as an option.
Once you have gone so far as to include a MAC, then you should consider taking steps against replay attacks. Any time someone can resend old data within the scope of a session is a chance for a replay attack. Making a session key usage as narrow as possible without causing issues for the user is one way to thwart replay attacks. Another thing you could do is include a date and time into the encrypted data to create a window for while the data is to be considered valid.
In summery, protecting the key is just the tip of the iceburg.
If you use a long key, I'd say the key was pretty safe. Some things to concern yourself with:
You are offloading data storage to the client. NEVER TRUST THE CLIENT. This doesn't mean you can't do this, just that you either have to treat the data in the cookie as untrusted (don't make any decisions more serious than what 'theme' to show the user based on it) or provide for a way to validate the data.
Some examples of how to validate the data would be to:
include a salt (so that people with the same session data don't get the same cookie) and
a checksum (so that someone who changes even one bit of the cookie makes it useless).
Rijndael was renamed AES. Yes, it is safe to use.
That said, you should consider carefully what you put in the cookie. It depends on what you have available in the way of storage on your system, but you could simply choose a random number (say a 64-bit number), and store that in the cookie. In your server-side system, you'd keep a record of who that number was associated with, and the other details. This avoids encryption altogether. You use the other details to validate (to the extent anything can be validated) whether the cookie was sent back from the browser you originally sent it to.
Alternatively, you can use a different encryption key for each session, keeping a track of which key was used with which session.
Even if you go with straight encryption with a fixed key, consider including a random number in with the data to be encrypted - this makes it harder to crack using a known plaintext attack because, by definition, the random number can't be known.
AES-128 should be more than sufficient, with no needs to use longer keys - if the key is chosen randomly.
However there are other issues. The first is that you should not use ECB. With ECB a given 128-bit block of plaintext always maps into the same 128-bit ciphertext if the key is the same. This means that adversaries can surgically modify the ciphertext injecting different blocks for which they know the corresponding ciphertext. For example they could mix the data of two different users. With other modes, CBC for example is fine, the ciphertext also depends on the IV (initialization vector), which should be different at every execution of the algorithm. This way, the same plaintext is ciphered differently each time and the adversary cannot gain any advantage. You also need to save the IV somewhere with the ciphertext, no need to protect it. Whenever the chance of reusing the same IV becomes non-negligible you should also change the key.
The second issue is that you should also append a message authentication code. Otherwise you would not be able to distinguish the forged cookies from the good ones.

How to prevent a file from being tampered with

I want to store confidential data in a digitally signed file, so that I know when its contents have been tampered with.
My initial thought is that the data will be stored in NVPs (name value pairs), with some kind of CRC or other checksum to verify the contents.
I am thinking of implementing the creating (i.e. writing) and verification (reading) of such a file, using ANSI C++.
Assuming this is the data I want to store:
//Unencrypted, raw data to be stored in file
struct PrivateInfo {
double age; weight;
FitnessScale fitness;
Location loc;
OtherStuff stuff;
};
//128-bit Encrypted Data (Payload to be stored in file)
struct EncryptedData {
// unknown fields/format ??
};
[After I have read a few responses to this question]
Judging by the comments I have received so far, I fear people are getting side tracked by the word "licensing" which seems to be a red flag to most people. I suspected that may be the case, but in todays atmosphere of heightened security and general nervousness, I thought I'd better detail what I needed to be "hiding" lest someone thought I was thinking of passing on the "Nuke password" to some terrorists or something. I will now remove the word "license" from my question.
View it more as a technical question. Imagine I am a student (which I am), and that I am trying to find out about recommended (or best practices) for encoding information that needs to be secure.
Mindful of the above, I will reformat my questions thus:
Given a struct of different data type fields, what is the "recommended" algorithm to give it a "reasonable secure" encryption (I still prefer to use 128 bit - but thats just me)
What is a recommended way of providing a ROBUST check on the encrypted data, so I can use that check value to know if the contents of the file (the Payload of encrypted data) differs from the original.?
First, note that "signing" data (to notice when it has been tampered with) is a completely separate and independent operation from "encrypting" data (to prevent other people from reading it).
That said, the OpenPGP standard does both. GnuPG is a popular implementation: http://www.gnupg.org/gph/en/manual.html
Basically you need to:
Generate a keypair, but don't bother publishing the public part.
Sign and encrypt your data (this is a single operation in gpg)
... storage ...
Decrypt and check the signature (this is also a single operation).
But, beware that this is only any use if you can store your private key more securely than you store the rest of the data. If you can't guarantee the security of the key, then GPG can't help you against a malicious attempt to read or tamper with your data. And neither can any other encryption/signing scheme.
Forgetting encryption, you might think that you can sign the data on some secure server using the private key, then validate it on some user's machine using the public key. This is fine as far as it goes, but if the user is malicious and clever, then they can invent new data, sign it using their own private key, and modify your code to replace your public key with theirs. Their data will then validate. So you still need the storage of the public key to be tamper-proof, according to your threat-model.
You can implement an equivalent yourself, something along the lines of:
Choose a longish string of random characters. This is your key.
Concatenate your data with the key. Hash this with a secure hash function (SHA-256). Then concatenate the resulting hash with your data, and encrypt it using the key and a secure symmetric cipher (AES).
... storage ...
Decrypt the data, chop off the hash value, put back the key, hash it, and compare the result to the hash value to verify that it has not been modified.
This will likely be faster and use less code in total than gpg: for starters, PGP is public key cryptography, and that's more than you require here. But rolling your own means you have to do some work, and write some of the code, and check that the protocol I've just described doesn't have some stupid error in it. For example, it has potential weaknesses if the data is not of fixed length, which HMAC solves.
Good security avoids doing work that some other, smarter person has done for you. This is the virtuous kind of laziness.
Err, why not use a well known encryption system like GPG?
The answers to the edited question depend on the specific scenario.
For q1 (encryption): if you encrypt and decrypt at your servers you can use a symmetric key algorithm. Otherwise you may want to use public key cryptography.
For q2, if you simply want to check if a file has changed you can use any cryptographic hash such as SHA-1 -- assuming that you can make sure that the hash itself wasn't change.
If the data generator and the verifier are both secure you can use MAC algorithm such as HMAC to to verify that the data and the MAC match. But this works only if the secret key remains secret.
Otherwise, you may be able to use digital signatures.
I'm going to change the phrasing of the question and see if it makes people happier (or I get downvoted). There are really two types of questions being asked:
You are making some computer game and you want to know if someone has been messing with your save files. (data signing)
You are writing a messaging program and want to keep people's message logs private. (data encryption)
I will deal with the second one (data encryption). It's a massively difficult topic and you should be looking for pre-built programs (such as PGP/GPG) even then it's going to take you a lot of time to understand and use properly. Think about encryption like this: it will be broken; your job is to make it not worth the effort. In other words make the effort required to break it greater than the value of the information.
As for the first one, again it can be broken. But a checksum is a good idea. see Amnon's answer for some links on that.
Hope this points you in the right direction. I'm not an expert on either topics but I hope this gives you a starting point. (you might want to re-phrase the question and see if you get some better answers)