My application needs to use a couple of hard-coded symmetric cryptographic keys (while I know that storing a public key would be the only perfect solution, this is non-negotiable). We want the keys to be stored obfuscated, so that they won't be recognizable by analyzing the executable, and be "live" in memory for as little time as possible - as to increase the difficulty of a memory dump retrieving them in clear-text. I'm interested in using C++ features (using some sort of scoped_key comes to mind). The solution must be portable - Windows, Linux, MacOS -, so it cannot take advantage of the operating system crypto API.
How would you go about designing such a system? Thanks a lot.
All you're going for here is security through obscurity. If you have one of us come up with an idea, you won't even have that.
John Skeet has a good article on this too.
Do something random is all I can say.
your scoped_key can be simply a KeyHolder object on the stack. Its constructor takes the obfuscated buffer and makes a real key out of it and its destructor zeros out the memory and deallocates the memory.
As for how to actually obfuscate the key in the binary, One silly choice you might try is put inside a much larger random binary block and remember its offset and size and probably XOR it with some short random sequence.
If you do the XORing thing you can actually avoid ever having the real key in memory. simply modify the decryption to read a byte from the key and before using it, to XOR it with the appropriate value.
*Add here disclaimer on how foolish security through obscurity is*
Have you considered if this is actually the weak point in your program? Let's assume for the sake of argument that you're doing a license check - though other checks apply equally well. No matter how well hidden your key, and how well obfuscated your algorithm, at some point you have to do something like this:
if(!passesCheck()) {
exit(1);
}
Your potential adversary doesn't have to find the key, decrypt it, figure out the algorithm, or anything else. All they have to do is find the location in the code where you determine if the check succeeded, and replace the 'jnz' instruction with a 'jmp' to make the test pass unconditionally.
You should look into Tamper Resistance software, such as Cloakware and Arxan.
TR is NOT cheap :)
If you do not take advantage of whatever platform you will run on, you cannot guarantee that you can effectively hide a symmetric cryptographic key in your program. You can assume that an attacker will have a debugger and will eventually figure out how to set a breakpoint in the function that has to use the key to decrypt. Then it's game over.
Why not look at steganography, as you can hide the key in an image. It is possible to find the key, but if the image doesn't follow an obvious pattern (such as an image of space would) then it may be harder. This can be cross-platform.
Related
I have a text file containing list of words (about 35 MB of data). I wrote an application that works pretty much like Scrabble helper or so. I find it insufficient to load the whole file into a set since it takes like 10 minutes to do it. I am not so experienced in C++ and thus I want to ask you what's a better way to achieve it? In my first version of application I just binary searched through it. So I managed to solve this problem by doing a binary search on a file (without loading it, just moving file pointer using seekg). But this solution isn't as fast as using maps of maps. When searching for word I look up it's first letter in a map. Then I retrieve a map of possible second letters and I do another search (for the second letter) and so on. In that way I am able to tell if the word is in dictionary much faster. How can I acheive it without loading whole file into a program to make these maps? Can I save them in a database and read them? Would that be faster?
35MB of data is tiny. There's no problem with loading it all into memory, and no reason for it to take 10 minutes to load. If it takes so long, I suspect your loading scheme recopies maps.
However, instead of fixing this, or coming up with your own scheme, perhaps you should try something ready.
Your description sounds like you could use a database of nested structures. MongoDB, which has a C++ interface, is one possible solution.
For improved efficiency, you could go a bit fancy with the scheme. Say up to 5 letter words, you could use a multikey index. Beyond that, you could go with completely nested structure.
Just don't do it yourself. Concentrate on your program logic.
First, I agree with Ami that 35 MB shouldn't in principle take that long to load and store in memory. Could there be a problem with your loading code (for example accidentally copying maps, causing lots of allocation/deallocation) ?
If I understand well your intention, you build a kind of trie structure (trie and not tree) using maps of maps as you described it. This can be very nice if in memory, but if you want to load only part of the maps in memory, it'll become very difficult (not to do it technically, but to determine which maps to load, and which not to load). You'd then risk to read much more data from disk than actually needed, although there are some implementations of persistend tries around.
If your intend is to have the indexing scheme on disk, I'd rather advise you to use a traditional B-tree data structure, which is designed to optimize loading of partial indexes. You can write your own, but there are already a couple of implementations acround (see this SO question).
Now you could also go to use something like sqlite which is a lightweitght DMS that you can easily embed in your applciation.
I need a program that reads the contents of a file and write them into another file but only the characters that are valid utf-8 characters. The problem is that the file may come in any encoding and the contents of the file may or may not correspond to such encoding.
I know it's a mess but that's the data I get to work with. The files I need to "clean" can be as big as a couple of terabytes so I need the program to be as efficient as humanly possible. Currently I'm using a program I write in python but it takes as long as a week to clean 100gb.
I was thinking of reading the characters with the w_char functions and then manage them as integers and discard all the numbers that are not in some range. Is this the optimal solution?
Also what's the most efficient way to read and write in C/C++?
EDIT: The problem is not the IO operations, that part of the question is intended as an extra help to have an even quicker program but the real issue is how to identify non UTF character quickly. Also, I have already tried palatalization and RAM disks.
Utf8 is just a nice way of encoding characters and has a very clearly defined structure, so fundamentally it is reasonably simple to read a chunk of memory and validate it contains utf8. Mostly this consists of verifying that certain bit patterns do NOT occur, such as C0, C1, F5 to FF. (depending on position)
It is reasonably simple in C (sorry, dont speak python) to code something that is a simple fopen/fread and check the bit patterns of each byte, although i would recommend finding some code to cut/paste ( eg http://utfcpp.sourceforge.net/ but i havent used these exact routines) as there are some caveats and special cases to handle. Just treat the input bytes as unsigned char and bitmask them directly. I would paste what I use, but not in office.
A C program will rapidly become IO bound, so suggestions about IO will then apply if you want ultimate performance, however direct byte inspection like this will be hard to beat in performance if you do it right. Utf8 is nice in that you can find boundaries even if you start in the middle of the file, so this leads nicely to parallel algorithms.
If you build you own, watch for BOM masks that might appear at start of some files.
Links
http://en.wikipedia.org/wiki/UTF-8 nice clear overview with table showing valid bit patterns.
https://www.rfc-editor.org/rfc/rfc3629 the rfc describing utf8
http://www.unicode.org/ homepage for unicode consortitum.
Your best bet according to me is parallilize. If you can parallelize the cleaning and clean many contents simoultaneously then the process will be more efficient. I'd look into a framework for parallelization e.g. mapreduce where you can multithread the task.
I would look at memory mapped files. This is something in the Microsoft world, not sure if it exists in unix etc., but likely would.
Basically you open the file and point the OS at it and it loads the file (or a chunk of it) into memory which you can then access using a pointer array. For a 100 GB file, you could load perhaps 1GB at a time, process and then write to a memory mapped output file.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366556(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366542(v=vs.85).aspx
This should I would think be the fastest way to perform big I/O, but you would need to test in order to say for sure.
HTH, good luck!
Unix/Linux and any other POSIX-compliant OSs support memory map(mmap) toow.
Each time I write another one of my small c++ toy programs, I come across the need for a small, easy-to-use options/parameter class. Here is what it should be able to do:
accept at least ints, doubles, string parameters
easy way to add new options
portable, small and fast
free
read options from a file and/or command line
upper and lower bounds for parameters
and all the other neat useful things I am not thinking of right now
What I want to do is pass a pointer to this class to the builder and all of my strategy objects, so they can read the parameters of the algorithm I am running (e.g. which algorithm, maximum number of iterations etc.)
Can someone point me to an implementation that achieves at least some of these things?
Boost Program-Options is pretty slick. I think it does all the things on your list, apart from maybe bounds validation. But even then, you can provide custom validators pretty easily.
Update: As #stefan rightly points out in a comment, this also fails on "small"! It adds quite a significant chunk to your binary if you statically link it in.
You might want to consider storing your configuration in JSON format. While reading JSON from the command-line is slightly awkward, it's still perfectly doable and even reasonably legible. Other than that you get a whole lot of flexibility, including nested configuration options, facilities for deserializing complicated data types etc.
There are numerous libraries for de-serializing JSON into C++, see e.g. this discussion and comparison of a few of them. Some are small, many are fast (although you don't actually need them to be fast - configuration data is very small), most are very portable. A long list and some benchmark results (although not a feature comparison) can be found here; some of these libraries might actually be geared towards use for reading configuration options, although that's just a wild guess.
Hello all
I need to encrypt text what is the best encryption to use programmatically ?
In general I have input file with string that I need to encrypt then read the file in the application
Decrypt it for the application flow .
with c++
The strongest encryption is to use a one-time pad (with XOR for example). The one time pad algorithm (unlike most other commonly used algorithms) is provably secure when used correctly.
One serious problem with this algorithm is that the distribution of the one-time pad must be done securely and this is often impractical. If it were possible to transmit the one time pad securely then it would typically also be possible to send the message securely using the same channel.
In situations where it is not possible to send information securely via another channel, public key cryptography is used. Generally the strength of these algorithms increases as the key length increases, unless some critical weakness is found in the algorithm. RSA is a commonly used public key algorithm.
To get strong encryption with public key cryptography the keys tend to be large (thousands of bits is not uncommon) and the algorithms are slow to compute. An alternative is to use a symmetric key algorithm instead. These can often get the same strength encryption with shorter keys and can be faster to encrypt and decrypt. Like one-time-pads this also has the problem of key distribution, but this time the key is very short so it is more feasible to be able to transfer it securely. An example of a commonly used symmetric key algorithm is AES.
One time pad is the strongest, but probably you are looking sth that you can use easily in your application. Check this page to learn about strength of algorithms - http://security.resist.ca/crypt.shtml and here you have a C++ library: crypto++ (the link points to a benchmark that compare performance of different algorithms) http://www.cryptopp.com/benchmarks.html.
The answer depends on what you mean by "strong encryption".
When cryptographers talk about strong encryption modes, they usually expect that it has at least two properties:
confidentiality: I.e. it is not possible to find any information about the plaintext given the ciphertext (with the possible exception of the plaintext length).
integrity: It must not be possible for an adversary to modify the ciphertext, without the receiver of the message noticing the modification.
When cryptographers call a cryptosystem "provably secure under some assumption" then they typically mean that the cryptosystem is secure against chosen ciphertext attacks unless the assumptions (e.g. there is no efficient algorithm for some well known problem) are not satisfied.
In particular, some of the other answers claim that the one-time pad is the most secure algorithm. However, the one-time pad alone does not provide any integrity. Without any modifiction it is easy to modify a ciphertext, without that the receiver notices the modification. That means that the one-time pad only satisfies a rather weak security notion called "perfect secrecy". I.e. nowadays it is quite misleading to call the one-time pad "provably secure", without mentioning that this only holds under a security model that does not include message integrity.
To select a strong encryption mode an might also look at practical aspect. E.g., how much cryptanalysis has gone into an encryption mode, or how well has the cryptographic library that implements the algorithm been analyzed. With that in mind, selecting a well-known cryptographic library, properly encrypting with AES, authenticating with HMAC is going to be close to optimal.
I am just wondering if you are supposed to write a sort of really secure application with data being transmitted over insecure networks, what kind of encryption algorithm will you use it in order to make it safe ? I know several c++ libraries for encryption providing nice functions with different algorithms, but i'm not quite sure which cipher to use - AES, DES, RSA, Blowfish or maybe something more different ?
Please provide your ideas and suggestions.
While some encryption algorithms are easier to break then others, the weak spot is more about key generation. If your key generation algorithm is predictable, it will be easier to figure out your key and decrypt the packet.
So, I wouldn't sweat which encryption algorithm (AES is fine). But make sure you have a good source of real randomness to generate your keys.
If you are on any of the common POSIX OS, look into using /dev/random to generate your key.
use AES for data encryption and use RSA to exchange AES key between parties
You have listed too few requirements for your encryption needs. It depends on circumstances.
For example, if both end-points of the communication link are trusted, you could just worry about encryption and have both of them produce public-keys for the other end to encrypt information with. In this case, RSA would be my personal choice.
However, if you do not trust the other end-point, and am using the encryption to determine whether it "has the key", then you would be counting on preset keys, rather than private/public encryption. In this case, Triple DES (DES is considered a little weak these days) may be a good choice.
Hope this helps!
Any of the well know ciphers have well understood levels of security so that part is easy. I'd be more concerned about the quality and trust level of the library you use and bugs in your use of it. Have you considered using external programs like ssh to do the key gen and handle the connection, and drive that with an API library?
Crypto++, perhaps the best known, largest, and best crypto library for C++, contains several algorithms you can use. It can also give you a good cryptographically secure random library for use with these algorithms.
According to their FAQ, that depends quite a bit on what you want to do. From their recommended list of algorithms:
block cipher: DES-EDE3, AES, Serpent (Serpent is slower than AES but has a larger security margin and is not vulnerable to timing attacks.)
stream cipher: any of the above block ciphers in CTR mode
fast stream cipher: Salsa20, Panama, Sosemanuk (available in version 5.5)
hash function: SHA-256, SHA-512, Whirlpool
message authentication code: HMAC/SHA1 or HMAC with one of the above hash functions
public key encryption: RSA/OAEP/SHA1, ECIES
signature: RSA/PSS/H, ECDSA/H, which H one of the above hash functions
key agreement: DH, ECDH
random number generator: RandomPool, AutoSeededRandomPool
We can't give an exact answer because there isn't one without knowing more about what you're trying to accomplish.
First of all you have to split up symmetric block ciphers with public key ciphers:
they have really different performances
their implementation is really different
their strenghts/weaknesses are based upon different factors
By the way:
DES is not considered secure anymore, you should always avoid using it
AES is still considered safe (you could use 256 bits keys)
RSA's (and other public key algorithms like ElGamal, Ellipctic Curve) security is based on the hardness of the mathematical problems on which these algorithms are based. For example: number factorization. If the number of bits you use to store the keys are big enough you can consider them enough safe.
However things can change according to the power of CPUs in few years so you shouldn't consider them safe forever..
One thing to consider is that public key ciphers are usually slower than block ciphers, that's why commong protocols usually use the first ones to exchange a simmetric key that then is used with an algorithm like AES.
The answer is: don't. If it has to be secure and you ask this question it means that you need to find a security expert to do it. You are not going to design a secure protocol by asking for help on SO. You can [maybe] use an existing protocol such as ssh or TLS, but if you roll your own you will fail.
If you want to transmit data over an insecure network, you need more than a cipher, you need a secure protocol, potentially including key distribution and authentication.
If you're really serious about crypto implementation, not just doing it to understand the basic mathematics of cryptography, then you need to do more than implement the number-crunching correctly. You also need to worry about side-channel attacks. For example, if your implementation takes a different amount of time depending on the key, as is common, then an attacker can deduce information about the key from your various response times. And that's just the basic algorithm, never mind putting it all together.
This is in general an unsolved problem and an area of ongoing research. Most or all implementations are flawed, although for the latest versions of well-used libraries probably not in a way that anyone has publicly announced they can exploit. Timing-based attacks on OpenSSL have been demonstrated in the past, albeit only on a highly predictable local network AFAIK. You can basically spend as long as you like on it, up to and including your entire career.
In practice, just use SSL, in whatever implementation comes with your platform.
Actually, we may not need to know all the details. But, IMHO, cascade these algorithms with Chain Of Responsibility Pattern, or Composite + Strategy.