CBC MAC: message length and length prepending - c++

I want to use CBC MAC in C++. First I hope to find some implementation of block cipher which I will use in CBC mode, which I understand is CBC MAC. But I have two questions:
1) If the length of the message to be authenticated is not multiple of block cipher block length, what shall I do?
2) To strengthen CBC MAC one recommended way as mentioned on Wiki is to put the length of the message in the first block. But how should I encode the length, as string? Or in binary? If block length of cipher is say 64 bits, do I encode the number as 64 bit number? e.g. if message length is 230, I should use following value as first block:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 ‭11100110‬
?

It depends on the 2nd question. You must "pad" the message with something until it is a multiple of the block size. The pad bytes are added to the message before the MAC is calculated but only the original message is transmitted/stored/etc.
For a MAC, the easiest thing to do is pad with zeros. However this has a vulnerability - if the message part ends in one or more zeros, an attacker could add or remove zeros and not change the MAC. But if you do step 2, this and another attack are both mitigated.
If you prepend the length of the message before the message (e.g. not just in the first block, but the first thing in the first block), it mitigates the ability to sometimes add/remove zeros. It also mitigates an attackers ability to forge a message with an entire arbitrary extra block added. So this is a good thing to do. And it is a good idea for completely practical reasons too - you know how many bytes the message is without relying on any external means.
It does not matter what the format of the length is - some encoded ASCII version or binary. However as a practical matter it should always be simple binary.
There is no reason that the number of bits in the length must match the cipher block size. The size of the length field must be large enough to represent the message sizes. For example, if the message size could range from 0 to 1000 bytes long, you could prepend an unsigned 16 bit integer.
This is done first before the MAC is calculated on both the sender and receiver. In essence the length is verified at the same time as the rest of the message, eliminating the ability for an attacker to forge a longer or shorter message.
There are many open source C implementations of block ciphers like AES that would be easy to find and get working.
Caveat
Presumably the purpose of the question is just for learning. Any serious use should consider a stronger MAC such as suggested by other comments and a good crypto library. There are other weaknesses and attacks that can be very subtle so you should never attempt to implement your own crypto. Neither of us are crypto experts and this should be done only for learning.
BTW, I recommend both of the following books by Bruce Schneier:
http://www.amazon.com/Applied-Cryptography-Protocols-Algorithms-Source/dp/0471117099/ref=asap_bc?ie=UTF8
http://www.amazon.com/Cryptography-Engineering-Principles-Practical-Applications/dp/0470474246/ref=asap_bc?ie=UTF8

Related

Can a file end with a 0x80 char?

I'm implementing my own version of blowfish encoder/decoder. I use a standard padding of 0x80 if necessary.
My question is if I need to add a padding chars even if I don't need it, because in the case of a file that ends naturally with 0x80, in the deconding part I will remove this character, but in this case it is a wrong action, since te 0x80 is part of the file itself.
This of course can be solved by adding a final char even if the total number of characters is a multiple of the encoding block (64bit in this case). I can implement this countermeasure, but first I'd prefer to know if I really need it.
Natural consequence is thinking if this type of char is chosen because never happens in a file (so the wrong situation above never happens), but I'm not sure at all.
Thanks! and sorry for the dummy question..
On Linux and other filesystem a file can contain any sequence of any bytes. So ideally you can not depended on any particular byte to decide file end. (Although EOF is there..!!)
What i am suggesting is the Most file formats are using.
You can have specific 4-5 magic bytes Header for your file format. and followed by you can have size of rest of bytes. So after some bytes your last byte would be there.
Edit:
In above suggestion In encoder you need to update size of file after adding any new data in files.
If you do not want that then you can encode your data in perticular chunk of data and then encode them packet by packet. Your file will be number of some packet. such things are used in NAL units
Blowfish is a block cipher. It always takes 64 bit input and outputs 64 bit output. If you want to encrypt a stream that is not a multiple of 64 bit long you will need to add some padding bytes. When you decrypt the encrypted stream you always get a multiple of 64 bit. But you have no information if the encrypted stream contained 'real' data or padding bytes. You need to keep track of that yourself. A simple approach would be to store the set of 'data length' and 'encrypted stream'. Another approach would be to prepend the clear text stream with a data length value, for example a 64 bit unsigned integer. Then after decrypting the encrypted stream you will have that length as the first value and then you know how many bytes of the last block are real data and how many are just padding.
And regarding your question about what bytes can be at the end of a file: any. You can have files with arbitrary content. Each byte in the file can be of any value, there is no restriction.
Regular binary file can contain any bytes sequence, so file can end with 0x80, with NULL or any other.
If you are talking about some specific standard, so it depends.. However I think that there is no such file type that could not contain some specific character in the end, I know about file types that ignores as many last characters as not needed (because header determines size) so you should do so, but never heard about illegal file data (except cracked).
So as mentioned use header, reserve for example 8 bytes that determines size. That is easy solution.
Also, before asking such question, you should ask yourself, why file should end with some special character?
The answer is Yes. On every operating system in current use, a file can end with any possible sequence of bytes. In fact, you should generate such a file to test your implementation.
In the general case you cannot recognise trailing padding characters or remove them reliably without knowing the length of the file. Therefore encoding the length of the file must be part of your cryptographic protocol.
Simply put the length of the file at the beginning and encrypt the whole thing, including any padding bytes you like (random is probably best). Once unencrypted you will have the file length to tell you where to truncate.

Encryption Initialization Vector

Using the aes_cfb_encrypt and aes_cfb_decrypt functions, I have the following questions.
What is unsigned char *iv (Initialization Vector) in an encryption.
Is it required to preserve the *iv for decryption.
Each time i encrypt a block of data the *iv is modified, What i have to do with this modified *iv.
I am encrypting a large file around 100mb, and passing a random *iv for the very first time, do i have to use the same *iv for the rest of the loop, or i have to use the updated *iv from the last call of encrypt block.
Lastly, I am dealing with a structured file, so do i had to use Sizeof(struct) as the length of the buffer or have to use sizeof(struct)*8 as length of the buffer for encryption or decryption.
Please guide..
AES_RETURN aes_cfb_encrypt(const unsigned char *ibuf, unsigned char *obuf, int len, unsigned char *iv, aes_encrypt_ctx cx[1]);
AES_RETURN aes_cfb_decrypt(const unsigned char *ibuf, unsigned char *obuf, int len, unsigned char *iv, aes_encrypt_ctx cx[1]);
In answering your questions, please note the following:
PT(x) = Plain Text representation of 'x'
CT(x) = Cipher Text representation of 'x'
Bn = Logical Data Block 'n' in a sequence of multiple blocks.
1. What is an IV?
IV is short notation for Initialization Vector. It is used in symmetric block-encryption algorithms that perform their encryption in what are called chained or feedback modes. In either, the previous block of encrypted data is used as a piece of functional data "goo" to alter the next block of data to be encrypted. Each successive block of data that is encrypted is fed the prior already-encrypted data block as their blob of goo to use. But what about the first block of plaintext? What does it use for its special sauce? Answer: the IV provided to the function. Pictorially, it looks like the following:
CT(B1) = Encrypt(IV + PT(B1))
CT(B2) = Encrypt(CT(B1) + PT(B2))
CT(B3) = Encrypt(CT(B2) + PT(B3))
...
CT(Bn) = Encrypt(CT(Bn-1) + PT(Bn))
Note: '+' in the above denotes the application of the prior cipher block to the next plaintext block. It is not to be thought of as mathematical-addition. Think of it as "combined with".
The size of the IV must be the same as the block size of the symmetric algorithm being used. Both AES-128-CFB and AES-256-CFB use a 128-bit block size (16 bytes). Therefore, your IV should be 16 bytes of random goo for your purposes in this question, and should be generated on the encryption-side using a secure FIPS-compliant random-source algorithm.
2. Is it required to preserve the IV for decryption?
Yes, but not necessarily in the fashion you may first think. The first IV (provided by you) must be retained somehow. Traditionally, it is sent right where you would think it should be; as the first block of encrypted data. This often freaks people out, they think "But if I send the IV with the data, it isn't as secure, is it?" Think about it this way. How many "IV's" are you sending, anyway? Remember, each data block is encrypted using the prior block of encrypted data as its IV. Therefore, you're actually sending an entire stream of IVs, each encrypted block the IV for the next encrypted block, etc. Where the initial IV is in your output ciphertext is a data-representation question, but where it goes is ultimately irrelevant to the question. It must be preserved. It is possible your API does this for you as part of its output stream (it is not uncommon at all, in fact).
3. Each time I encrypt a block of data the *iv is modified, What i have to do with this modified *iv?
I'm not familiar with the API you're using, but it sounds like you're given the IV to use for the next encryption, which makes perfect sense when you consider how feedback or chaining works for block-mode encryption. You should NOT use the same IV repeatedly. Use the one returned last as the next one. Since your API is modifying the IV in place, it appears the only thing you may need to do is preserve the initial IV somewhere else before sending. I would compare the first ciphertext block against your IV. if they are not the same, you probably need to send your IV, then the cipher text chain in your data stream, and have the receiver aware that the first block is the IV of the decryption.
4. I am encrypting a large file around 100mb, and passing a random *iv for the very first time, do i have to use the same *iv for the rest of the loop, or i have to use the updated *iv from the last call of encrypt block.
See (3). Use the updated IV for each successive block.
5. Lastly, I am dealing with a structured file, so do i Sizeof(struct) as the length of the buffer or have to use sizeof(struct)*8 as length of the buffer for encryption or decryption.?
Use the size of your structure in bytes (not bits). The C/C++ sizeof(yourstruct) should compute this for you, but note if you're encrypting each structure as an independent entity (and not the entire file in one mass), each encryption will carry with it a minimum amount of data added to account for (a) the IV used for that structure, and (b) padding the last block out to an even block boundary, assuming you're using PKCS5 padding. The exact size of an encrypted structure, therefore, would be:
IV + ((sizeof(struct) + 15)/16)*16) bytes.
Again, this is if you're independently encrypting, and storing, each structure as a singular encryption, and again, your API may account for some of this for you.
For more information on symmetric AES, see the AES entry on Wiki. For information on the CFB block cipher mode, see the Block Cipher Modes of Operation article on the same site.
I hope this helps. Do some homework and above all, learn exactly how your API works, which is something I cannot, unfortunately, help you with.
The initialization vector (IV) in a cryptographic system is a random value that is included as part of the encryption system's initialization to ensure that if the same data is encrypted multiple times, it always comes back looking different. This is a requirement of secure cryptographic systems to ensure that an attacker looking at multiple different encrypted messages cannot easily determine whether any two of those messages are the same. Ideally, the IV should be chosen completely randomly.
You should not need to preserve the IV for decryption. Typically, the IV is sent in plaintext along with the encrypted data. That's not a security concern - it's by design.
The IV is changed on each iteration of encryption because internally the cryptographic system is iteratively applying a block cipher to the data, then using the output of that cipher, combined with some extra data, as the new IV for the next application of the block cipher. This process is then iterated as many times as necessary. I suspect (but am not sure) that the IV is handed back to you so that you can encrypt more data where you left off from before. You should definitely double-check this!
As for whether to use the size of your structure or eight times that - I can't say without seeing more of your code. However, you should probably be providing the total number of bytes to encrypt, so if you're encrypting eight copies of the struct, pass in eight times the sizeof the struct.
Hope this helps!

Are blowfish and twofish encrypted byte by byte?

I am receiving packets through network. But some of those packets have dynamic length, so second byte has a 2 bytes long WORD that contains length. So I receive the packet number first, then receive all according to the length. Everything is okay here, when there is no encryption. Will it be same if i use twofish or blowfish encryption ? What i mean is, 'A' is encrypted as 'B' but will 'AA' be encrypted as 'BB' ? Can i extract a byte and decrypt it from a packet encrypted with TF/BF as whole ?
What i mean is, 'A' is encrypted as 'B' but will 'AA' be encrypted as 'BB' ?
A sensible encryption algorithm will never do this, otherwise the encrypted info can be easily broken by frequency analysis. (This is known as substitution cipher, by the way). This is of course true* for blowfish and twofish.
Even if you want to extract a byte in the middle, you have to decrypt the whole packet first.
*: unless you use the weak ECB mode, which only reduces the two encryption algorithms into substitution ciphers over 64-bit/128-bit blocks).
Generally the answer is to pad the encrypted data. Don't just pad by adding 0's until you get to the block length, however; padding can give away a bit too much information.
As far as extracting a byte, depending on the cipher mode used - how the cipher is changed between blocks - you should not be able to do this. You'll need to decrypt all the way up the byte byte you'd like to read. It is general practice for the encryption to be "transparent" - i.e. you do your network programming, then slap SSL onto it, so that SSL handles encrypting everything, dealing with variable lengths, etc. and you just get to deal with plain old data.
As to whether throwing SSL at it is a good idea with your design, I have no idea, but you can use the concept.
Twofish, at its base level, encodes 16-byte blocks. So the minimum piece of Twofish-encrypted data you can have is 16 bytes long. If your data contains the length, then you can decrypt it then throw away any extra bytes in the last block.
So to encrypt 'A' you need to pad it (somehow - there are various ways - all zeroes is apparently not the best way) to 16 bytes, then encrypt your one byte of data and your 15 unwanted bytes. You get a 16-byte encrypted block. On decryption you can throw away the extra bytes.
I suggest half an hour reading these Wikipedia articles:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
http://en.wikipedia.org/wiki/Padding_%28cryptography%29
Both of them have been helpful to me.

two AES implementations generated different encryption results

I have an application that uses an opensource "libgcrypt" to encrypt/decrypt a data block (32 bytes). Now I am going to use Microsoft CryptAPI to replace it. My problem is that the libgcrypt and cryptApi approaches generate different ciphertext contents as I use the same AES-256 algoritjm in CFB mode, same key, and same IV, although the ciphertext can be decrypted by their own correspndingly.
Could some tell me what is the problem? Thanks.
Do the two assume different endianness, or assign the bytes in the key/IV in different orders?
If the endianness assumptions are different, you may need to re-order the bytes in the key, IV and/or plaintext to get matching results. For example, if you are supplying bytes in the order abcdefgh, you may need to switch this to 'dcbahgfe' to get things to work.
There is an additional parameter for CFB, namely the "shift amount" at each iteration. The Wikipedia page on CFB has some information. Namely, you encrypt x bits for every block encryption, where x is any value between 1 and the block size (128 for AES). I suspect that in your code, the Microsoft CryptoAPI and libgcrypt do not use the same value for x.
As explained in the documentation for CryptSetKeyParam(), Windows defaults to x=8 (i.e. one byte at a time). This is the KP_MODE_BITS parameter. On the other hand, libgcrypt defaults to x=n for a n-bit block cipher (i.e. x=128 for AES). I am not sure libgcrypt can be convinced to use another value.
i think the problem is with block size .as you said you are using 32 byte as block size make sure if block size of both are same and supports as well .because some of library block size is fixed for Aes as 16 byte .
What is the length of your key and IV?
Are ciphertexts different if the length of opentext is exactly 256 bit?
I have same problem, but with a different library. I noticed one thing in this library; If I pass input byte less than 32 bytes, in that case it's showing me both are the same encrypted data.
Is that what's happening in your case? If so, it means the problem is with the padding mechanism.

Decryption with AES and CryptoAPI? When you know the KEY/SALT

Okay so i have a packed a proprietary binary format. That is basically a loose packing of several different raster datasets. Anyways in the past just reading this and unpacking was an easy task. But now in the next version the raster xml data is now to be encrypted using AES-256(Not my choice nor do we have a choice).
Now we basically were sent the AES Key along with the SALT they are using so we can modify our unpackager.
NOTE THESE ARE NOT THE KEYS JUST AN EXAMPLE:
They are each 63 byte long ASCII characters:
Key: "QS;x||COdn'YQ#vs-`X\/xf}6T7Fe)[qnr^U*HkLv(yF~n~E23DwA5^#-YK|]v."
Salt: "|$-3C]IWo%g6,!K~FvL0Fy`1s&N<|1fg24Eg#{)lO=o;xXY6o%ux42AvB][j#/&"
We basically want to use the C++ CryptoAPI to decrypt this(I also am the only programmer here this week, and this is going live tomorrow. Not our fault). I've looked around for a simple tutorial of implementing this. Unfortunately i cannot even find a tutorial where they have both the salt and key separately. Basically all i have really right now is a small function that takes in an array of BYTE. Along with its length. How can i do this?
I've spent most of the morning trying to make heads/tails of the cryptoAPI. But its not going well period :(
EDIT
So i asked for how they encrypt it. They use C#, and use RijndaelManaged, which from my knowledge is not equivalent to AES.
EDIT2
Okay finally got exactly what was going on, and they sent us the wrong keys.
They are doing the following:
Padding = PKCS7
CipherMode = CBC
The Key is defined as a set of 32 Bytes in hex.
The IV is defined as a set of 32 bytes in hex too.
They took away the salt when i asked them.
How hard is it to set these things in CryptoAPI using the wincrypt.h header file.?
AES-256 uses 256 bit keys. Ideally, each key in your system should be equally likely. A 63 byte string would be 504 bits. You first need to figure out how the string of 63 characters needs to be converted to 256 bits (The sample ones you gave are not base64 encoded). Next, "salt" isn't an intrinsic part of AES. You might be referring to either an initialization vector (IV) in Cipher-Block-Chaining mode or you could be referring to somehow updating the key.
If I were to guess, I'm assuming that by "SALT" you mean IV and specifically CBC mode.
You will need to know all of this when using CAPI functions (e.g. decrypt).
If all of this sounds confusing, then it might be best to change your design so that you don't have to worry about getting all of this right. Crypto is hard. One bad step could invalidate all the security. Consider looking at this comment on my Stick Figure Guide to AES.
UPDATE: You can look at this for a rough starting point for C++ CAPI. You'll need a 64 character hex string to get 256 bits ( 256 bits / (4 bits / char) == 64 chars). You can convert the chars to bits yourself.
Again, I must caution that playing fast and loose with IV's and keys can have disastrous consequences. I've studied AES/Rijndael in depth down to the math and gate level and have even written my own implementation. However, in my production code, I stick to using a well-tested TLS implementation if at all possible for data in transit. Even for data at rest, it'd be better to use a higher level library.
Rijndael is the algorithm name for AES