Fast asymmetric cypher for C++ application

Fast asymmetric cypher for C++ application - c++

I'm looking for a fast asymmetric cypher algorithm to be used in C++ program.
Our application accesses read-only data stored in archive (custom format, somewhat similar to tar), and I would like to prevent any modifications of that archive by asymmetrically encrypting archive index (I'm aware that this isn't a perfect solution and data can still be extracted and repacked using certain techniques).
Some individual files within archive are encrypted with symmetric cypher and encryption keys for them are stored within archive index(header). Which is why I want to encrypt archive header asymmetrically.
Cypher requirements:
1) Algorithm implementation should be platform-independent.
2) Algorithm should be either easy to implement myself or it should be available in library (with source code) that allows static linking with proprietary application, which means that GPL/LGPL/viral licenses cannot be used. MIT/BSD-licensed code, or public domain code is acceptable.
3) If cypher is available in library, ideally it should have small memory footprint, and implementation should be compact. I would prefer to use a C/C++ library that implements only one cipher instead of full-blown all-purpose cipher collection.
Originally I wanted to use RSA, but it looks like it is simply too slow to be useful, and there aren't many alternatives.
So, any advice on what can I use?

Okay, I've found what I've been looking for, and I think it is better than OpenSSL (for my purposes, at least).
There are two libraries:
libtomcrypt, which implements several cyphers (including RSA), and libtommath, that implements bignum arithmetics. Both libraries are in public domain, easy to hack/modify and have simpler programming interface than OpenSSL, and (much) better documentation than OpenSSL.
Unlike older public domain rsa code I found before, libtomcrypt can generate new keys very quickly, can import OpenSSL-generated keys, and supports padding. Another good thing about libtomcrypt is that it doesn't have extra dependencies (OpenSSL for windows wants gdi32, for example) and is smaller than OpenSSL.
I've decided to use RSA for encryption, after all, because (to me it looks like) there are no truly asymmetric alternatives. It looks like most of the other ciphers (elgamal, elliptic curves) are more suitable for symmetric encryption where session key is being encrypted asymmetrically. Which isn't suitable for me. Such ciphers are suitable for network communications/session keys, but it wouldn't be good to use that for static unchanging data on disk.
As for "RSA being slow", I've changed archive format a bit, so now only small chunk of data is being asymmetrically encrypted. Failure to decrypt this chunk will make reading archive index completely very difficult if not impossible. Also, I must admit that slowness of RSA was partially a wrong impression given by older code I've tried to use before.
Which means, question solved. Solution is RSA + libtomcrypt. RSA - because there aren't many alternatives to RSA, and libtomcrypt - because it is small and in public domain.

OpenSSL should do the job for you. It's open-source (apache license, so meets your license requirements).
It's widely used and well tested.

Use a custom RSA to sign the archive. Store the public key in the application and keep the private key in house. Now anyone could modify the read only archive, but your application would refuse to load the modified archive.

Check out Curve25519, which is elliptic curve crytpography implemented efficiently, and around patent problems.
It meets all of your requirements. See Here.
You can use it to encrypt, or to simply sign.
As a side note:
For integrity checking, a MAC should suffice unless you really need assymetric encryption.

How about MD5?
Yes I am aware that MD5 has been 'broken; - but most practical applications this is irrelevant.
Especially if the modified data would also have to be valid in the particular data format as well as have the correct MD5
EDIT:
MD5 is appropriate if you want to just ensure that data stored can't be changed (or at least you can detect it) but it doesn't hide the data. Note that if you must have the key in your app alongside the data it can always be extracted. There are techniques for hiding the key - a popular one is simply to put it inside a static resource such as an icon that can be linked easily.

Related

Why should I call EVP_DigestInit rather than calling hash functions directly?

In OpenSSL documentation for SHA512 there's written following recommendation:
Applications should use the higher level functions EVP_DigestInit(3) etc. instead of calling the hash functions directly.
What is the reason for that? It's safer? There is no explanation why I should use IT.
I want to make SHA512 hash and according to this recommendation, I should use for computing this hash EVP_* functions instead of SHA512_* functions. Or did I understand it wrong?
SHA512_CTX m_context;
SHA512_Init(&m_context)
SHA512_Update(&m_context, data, size)
SHA512_Final(hash, &m_context);
auto m_context = EVP_MD_CTX_create();
EVP_DigestInit_ex(m_context, EVP_get_digestbyname("sha512"), NULL);
EVP_DigestUpdate(m_context, data, size);
EVP_DigestFinal_ex(m_context, hash, NULL);

I asked the same thing and here is their answer
https://github.com/openssl/openssl/issues/12260
For a number of reasons. For example:
In some cases you get sub-optimal implementations with the low-level APIs vs the EVP APIs. An example from the "cipher" world (but the same concept applies to digests) is the low-level function AES_encrypt. If you call that you will never get the AESNI optimized version which may be available on your platform.
We want to encourage applications to use a consistent API for all types of digests/ciphers etc. This makes it much easier for everyone to update code as security advice changes. For example you mention the MD5 digest APIs. MD5 is no longer recommended. It's a lot harder to update your code to use some other digest if you've used the low-level APIs vs the high level ones. Looking into the future imagine some algorithm has some major flaw discovered in it that requires a rapid migration away from it to some other algorithm. We want the OpenSSL ecosystem to be as agile as possible to be able to deal with that.
The old APIs no longer fits architecturally with how OpenSSL 3.0 works. All algorithm implementations are now made available by "providers" in OpenSSL 3.0 - for example we have the "default" provider, the "fips" provider and a "legacy" provider. The low level APIs circumvent all of that which means we have to maintain 2 ways of doing everything (3 actually when you bring ENGINEs into the picture as well). This leads to unnecessary code bloat and complexity....which is definitely something you want to avoid in a crypto library. Ideally we would have removed the old ways completely - but that would have been too big a breaking change.

Fast embedded database

I am working on an application which will need to store metadata associated with music files (artist, title, play count, etc.), as well as sets of integers (in particular, SHA-1 hashes).
The solution I pick needs to:
Provide "fast" storage & retrieval (when viewing a list of potentially thousands of songs I need to be able to retrieve metadata more or less interactively).
Be cross-platform (to Linux, Windows and OSX).
Provide an interface I can interact with from C++.
Be open-source (or, at the very least, be free as in beer).
Provide fast set operations (union, intersection, difference) - if the solution doesn't provide this, but it will allow me to store binary data, I could implement this myself using a technique like "Fast Set Operations Using Treaps".
Be "embedded" - that is, operate without me having to fork another process, or at least provide an easy interface to do so (like libmysqld).
Solutions I have considered include:
Flat files. This is extremely simple, but doesn't provide any features besides flat data storage.
SQlite. This seems to be a very popular option, but it seems to have some issues regarding performance and concurrency (see KDE's Akonadi, for some example issues).
Embedded MySQL/MariaDB. This seems to be a reasonable option, but it also might be a bit heavyweight considering I won't be needing a lot of complicated SQL features.
A hypothetical solution I'm thinking would be perfect would be something like Redis, but which persists data to the disk, and only stores some portion of the data in memory to make retrieval fast. Redis itself might not be a good option because 1) I would need to fork it manually, 2) its Windows port seems less than rock-solid, and 3) storing all of my data in RAM would be less than ideal.
Are there any other solutions for this type of problem, or is one of the solutions I have already listed far better than the others?

In the end, I've decided to use SQlite for metadata. It seems to be as fast if not faster than e.g. libmysqld, and it has a really simple clean C interface. According to benchmarks, it should be way more than fast enough to suit my needs.
For larger data structures, I'm planning on just storing them in separate binary files (the SQlite website says it can store binary data, but that if your data size exceeds a certain amount it is faster to store it in flat files instead - see this page).

Don't store you binary files BLOBS inside SQLite, unless you want an elephant size database. Just store a string with the path file name on the file system. The only downside of SQLite is that it does not allow remote (web) access, but you can embedded it inside a small TCP/HTTP server.

Loading and storing encryption keys from a config source

I am writing an application which has an authenticity mechanism, using HMAC-sha1, plus a CBC-blowfish pass over the data for good measure. This requires 2 keys and one ivec.
I have looked at Crypto++ but the documentation is very poor (for example the HMAC documentation). So I am going oldschool and use Openssl. Whats the best way to generate and load these keys using library functions and tools ? I don't require a secure-socket therefore a x.509 certificate probably does not make sense, unless, of-course, I am missing something.
So, do I need to write my own config file, or is there any infrastructure in openssl for this ? If so, could you direct me to some documentation or examples for this.

Although it doesn't answer your question directly, if you are looking at this as a method of copy protection for your program, the following related questions may make for interesting reading.
Preventing the Circumvention of Copy Protection
What copy protection technique do you use?
Software protection by encryption
How do you protect your software from illegal distribution?

This is the solution I am going for atm. Unless of course someone comes up with a better one, or one that solves my specific problem.
I will put three files in /etc/acme/auth/file1 file2 and file3, binary files with randomly generates numbers for the 2 keys and the ivec, and do the same in windows but under c:\etc\acme\auth.

boost serialization vs google protocol buffers? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Does anyone with experience with these libraries have any comment on which one they preferred? Were there any performance differences or difficulties in using?

I've been using Boost Serialization for a long time and just dug into protocol buffers, and I think they don't have the exact same purpose. BS (didn't see that coming) saves your C++ objects to a stream, whereas PB is an interchange format that you read to/from.
PB's datamodel is way simpler: you get all kinds of ints and floats, strings, arrays, basic structure and that's pretty much it. BS allows you to directly save all of your objects in one step.
That means with BS you get more data on the wire but you don't have to rebuild all of your objects structure, whereas protocol buffers is more compact but there is more work to be done after reading the archive. As the name says, one is for protocols (language-agnostic, space efficient data passing), the other is for serialization (no-brainer objects saving).
So what is more important to you: speed/space efficiency or clean code?

I've played around a little with both systems, nothing serious, just some simple hackish stuff, but I felt that there's a real difference in how you're supposed to use the libraries.
With boost::serialization, you write your own structs/classes first, and then add the archiving methods, but you're still left with some pretty "slim" classes, that can be used as data members, inherited, whatever.
With protocol buffers, the amount of code generated for even a simple structure is pretty substantial, and the structs and code that's generated is more meant for operating on, and that you use protocol buffers' functionality to transport data to and from your own internal structures.

There are a couple of additional concerns with boost.serialization that I'll add to the mix. Caveat: I don't have any direct experience with protocol buffers beyond skimming the docs.
Note that while I think boost, and boost.serialization, is great at what it does, I have come to the conclusion that the default archive formats it comes with are not a great choice for a wire format.
It's important to distinguish between versions of your class (as mentioned in other answers, boost.serialization has some support for data versioning) and compatibility between different versions of the serialization library.
Newer versions of boost.serialization may not generate archives that older versions can deserialize. (the reverse is not true: newer versions are always intended to deserialize archives made by older versions). This has led to the following problems for us:
Both our client & server software create serialized objects that the other consumes, so we can only move to a newer boost.serialization if we upgrade both client and server in lockstep. (This is quite a challenge in an environment where you don't have full control of your clients).
Boost comes bundled as one big library with shared parts, and both the serialization code and the other parts of the boost library (e.g. shared_ptr) may be in use in the same file, I can't upgrade any parts of boost because I can't upgrade boost.serialization. I'm not sure if it's possible/safe/sane to attempt to link multiple versions of boost into a single executable, or if we have the budget/energy to refactor out bits that need to remain on an older version of boost into a separate executable (DLL in our case).
The old version of boost we're stuck on doesn't support the latest version of the compiler we use, so we're stuck on an old version of the compiler too.
Google seem to actually publish the protocol buffers wire format, and Wikipedia describes them as forwards-compatible, backwards-compatible (although I think Wikipedia is referring to data versioning rather than protocol buffer library versioning). Whilst neither of these is a guarantee of forwards-compatibility, it seems like a stronger indication to me.
In summary, I would prefer a well-known, published wire format like protocol buffers when I don't have the ability to upgrade client & server in lockstep.
Footnote: shameless plug for a related answer by me.

Boost Serialisation
is a library for writing data into a stream.
does not compress data.
does not support data versioning automatically.
supports STL containers.
properties of data written depend on streams chosen (e.g. endian, compressed).
Protocol Buffers
generates code from interface description (supports C++, Python and Java by default. C, C# and others by 3rd party).
optionally compresses data.
handles data versioning automatically.
handles endian swapping between platforms.
does not support STL containers.
Boost serialisation is a library for converting an object into a serialised stream of data. Protocol Buffers do the same thing, but also do other work for you (like versioning and endian swapping). Boost serialisation is simpler for "small simple tasks". Protocol Buffers are probably better for "larger infrastructure".
EDIT:24-11-10: Added "automatically" to BS versioning.

I have no experience with boost serialization, but I have used protocol buffers. I like protocol buffers a lot. Keep the following in mind (I say this with no knowledge of boost).
Protocol buffers are very efficient so I don't imagine that being a serious issue vs. boost.
Protocol buffers provide an intermediate representation that works with other languages (Python and Java... and more in the works). If you know you're only using C++, maybe boost is better, but the option to use other languages is nice.
Protocol buffers are more like data containers... there is no object oriented nature, such as inheritance. Think about the structure of what you want to serialize.
Protocol buffers are flexible because you can add "optional" fields. This basically means you can change the structure of protocol buffer without breaking compatibility.
Hope this helps.

boost.serialization just needs the C++ compiler and gives you some syntax sugar like
serialize_obj >> archive;
// ...
unserialize_obj << archive;
for saving and loading. If C++ is the only language you use you should give boost.serialization a serious shot.
I took a fast look at google protocol buffers. From what I see I'd say its not directly comparable to boost.serialization. You have to add a compiler for the .proto files to your toolchain and maintain the .proto files itself. The API doesn't integrate into C++ as boost.serialization does.
boost.serialization does the job its designed for very well: to serialize C++ objects :)
OTOH an query-API like google protocol buffers has gives you more flexibility.
Since I only used boost.serialization so far I cannot comment on performance comparison.

Correction to above (guess this is that answer) about Boost Serialization :
It DOES allow supporting data versioning.
If you need compression - use a compressed stream.
Can handle endian swapping between platforms as encoding can be text, binary or XML.

I never implemented anything using boost's library, but I found Google protobuff's to be more thought-out, and the code is much cleaner and easier to read. I would suggest having a look at the various languages you want to use it with and have a read through the code and the documentation and make up your mind.
The one difficulty I had with protobufs was they named a very commonly used function in their generated code GetMessage(), which of course conflicts with the Win32 GetMessage macro.
I would still highly recommend protobufs. They're very useful.

I know that this is an older question now, but I thought I'd throw my 2 pence in!
With boost you get the opportunity to I'm write some data validation in your classes; this is good because the data definition and the checks for validity are all in one place.
With GPB the best you can do is to put comments in the .proto file and hope against all hope that whoever is using it reads it, pays attention to it, and implements the validity checks themselves.
Needless to say this is unlikely and unreliable if your relying on someone else at the other end of a network stream to do this with the same vigour as oneself. Plus if the constraints on validity change, multiple code changes need to be planned, coordinated and done.
Thus I consider GPB to be inappropriate for developments where there is little opportunity to regularly meet and talk with all team members.
==EDIT==
The kind of thing I mean is this:
message Foo
{
int32 bearing = 1;
}
Now who's to say what the valid range of bearing is? We can have
message Foo
{
int32 bearing = 1; // Valid between 0 and 359
}
But that depends on someone else reading this and writing code for it. For example, if you edit it and the constraint becomes:
message Foo
{
int32 bearing = 1; // Valid between -180 and +180
}
you are completely dependent on everyone who has used this .proto updating their code. That is unreliable and expensive.
At least with Boost serialisation you're distributing a single C++ class, and that can have data validity checks built right into it. If those constraints change, then no one else need do any work other than making sure they're using the same version of the source code as you.
Alternative
There is an alternative: ASN.1. This is ancient, but has some really, really, handy things:
Foo ::= SEQUENCE
{
bearing INTEGER (0..359)
}
Note the constraint. So whenever anyone consumes this .asn file, generates code, they end up with code that will automatically check that bearing is somewhere between 0 and 359. If you update the .asn file,
Foo ::= SEQUENCE
{
bearing INTEGER (-180..180)
}
all they need to do is recompile. No other code changes are required.
You can also do:
bearingMin INTEGER ::= 0
bearingMax INTEGER ::= 360
Foo ::= SEQUENCE
{
bearing INTEGER (bearingMin..<bearingMax)
}
Note the <. And also in most tools the bearingMin and bearingMax can appear as constants in the generated code. That's extremely useful.
Constraints can be quite elaborate:
Garr ::= INTEGER (0..10 | 25..32)
Look at Chapter 13 in this PDF; it's amazing what you can do;
Arrays can be constrained too:
Bar ::= SEQUENCE (SIZE(1..5)) OF Foo
Sna ::= SEQUENCE (SIZE(5)) OF Foo
Fee ::= SEQUENCE
{
boo SEQUENCE (SIZE(1..<6)) OF INTEGER (-180<..<180)
}
ASN.1 is old fashioned, but still actively developed, widely used (your mobile phone uses it a lot), and far more flexible than most other serialisation technologies. About the only deficiency that I can see is that there is no decent code generator for Python. If you're using C/C++, C#, Java, ADA then you are well served by a mixture of free (C/C++, ADA) and commercial (C/C++, C#, JAVA) tools.
I especially like the wide choice of binary and text based wireformats. This makes it extremely convenient in some projects. The wireformat list currently includes:
BER (binary)
PER (binary, aligned and unaligned. This is ultra bit efficient. For example, and INTEGER constrained between 0 and 15 will take up only 4 bits on the wire)
OER
DER (another binary)
XML (also XER)
JSON (brand new, tool support is still developing)
plus others.
Note the last two? Yes, you can define data structures in ASN.1, generate code, and emit / consume messages in XML and JSON. Not bad for a technology that started off back in the 1980s.
Versioning is done differently to GPB. You can allow for extensions:
Foo ::= SEQUENCE
{
bearing INTEGER (-180..180),
...
}
This means that at a later date I can add to Foo, and older systems that have this version can still work (but can only access the bearing field).
I rate ASN.1 very highly. It can be a pain to deal with (tools might cost money, the generated code isn't necessarily beautiful, etc). But the constraints are a truly fantastic feature that has saved me a whole ton of heart ache time and time again. Makes developers whinge a lot when the encoders / decoders report that they've generated duff data.
Other links:
Good intro
Open source C/C++ compiler
Open source compiler, does ADA too AFAIK
Commercial, good
Commercial, good
Try it yourself online
Observations
To share data:
Code first approaches (e.g. Boost serialisation) restrict you to the original language (e.g. C++), or force you to do a lot of extra work in another language
Schema first is better, but
A lot of these leave big gaps in the sharing contract (i.e. no constraints). GPB is annoying in this regard, because it is otherwise very good.
Some have constraints (e.g. XSD, JSON), but suffer patchy tool support.
For example, Microsoft's xsd.exe actively ignores constraints in xsd files (MS's excuse is truly feeble). XSD is good (from the constraints point of view), but if you cannot trust the other guy to use a good XSD tool that enforces them for him/her then the worth of XSD is diminished
JSON validators are ok, but they do nothing to help you form the JSON in the first place, and aren't automatically called. There's no guarantee that someone sending you JSON message have run it through a validator. You have to remember to validate it yourself.
ASN.1 tools all seem to implement the constraints checking.
So for me, ASN.1 does it. It's the one that is least likely to result in someone else making a mistake, because it's the one with the right features and where the tools all seemingly endeavour to fully implement those features, and it is language neutral enough for most purposes.
To be honest, if GPB added a constraints mechanism that'd be the winner. XSD is close but the tools are almost universally rubbish. If there were decent code generators of other languages, JSON schema would be pretty good.
If GPB had constraints added (note: this would not change any of the wire formats), that'd be the one I'd recommend to everyone for almost every purpose. Though ASN.1's uPER is very useful for radio links.

As with almost everything in engineering, my answer is... "it depends."
Both are well tested, vetted technologies. Both will take your data and turn it into something friendly for sending someplace. Both will probably be fast enough, and if you're really counting a byte here or there, you're probably not going to be happy with either (let's face it both created packets will be a small fraction of XML or JSON).
For me, it really comes down to workflow and whether or not you need something other than C++ on the other end.
If you want to figure out your message contents first and you're building a system from scratch, use Protocol Buffers. You can think of the message in an abstract way and then auto-generate the code in whatever language you want (3rd party plugins are available for just about everything). Also, I find collaboration simplified with Protocol Buffers. I just send over a .proto file and then the other team has a clear idea of what data is being transfered. I also don't impose anything on them. If they want to use Java, go ahead!
If I already have built a class in C++ (and this has happened more often than not) and I want to send that data over the wire now, Boost Serialization obviously makes a ton of sense (especially where I already have a Boost dependency somewhere else).

You can use boost serialization in tight conjunction with your "real" domain objects, and serialize the complete object hierarchy (inheritance). Protobuf does not support inheritance, so you will have to use aggregation. People argue that Protobuf should be used for DTOs (data transfer objects), and not for core domain objects themselves. I have used both boost::serialization and protobuf. The Performance of boost::serialization should be taken into account, cereal might be an alternative.

RSA algo in symbian c++

i have implemented RSA algo in symbian using class
CRSAPublicKey and CRSAPKCS1v15Encryptor
wheter it is rightway to implent encryption

With encryption, the devil is in the details.
There are many, many ways to get encryption incorrect. For example, with RSA, you need to use padding, or a number of easy attacks become possible. Even if you get the padding right (which, if you're using CRSAPKCS1v15Encryptor correctly, you might be), you have to defend against replay attacks, and find some way to get a trusted key. Or you might flub the comparison in the end like Nintendo did.
Rather than asking about how to do RSA, you need to consider the entire context of what you're trying to do. What are you using RSA for, and how are you using it?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js