C++ Serialization Library for Existing Protocol

C++ Serialization Library for Existing Protocol - c++

I'm writing a C++ library for an existing networking protocol (one with an document specifying the exact packet layout). As there are a considerable number of packet definitions, rather than writing all the serialization/de-serialization methods manually, are there any serialization libraries which are capable of specifying a packet layout specifically?
I've been looking at things like Google Protobuf and Apache Thrift, but they seem to be focused towards developing a server and client in tandem, where the packet layout does not matter along as it is consistent across a single release of the software. I need to serialize to an existing specification, so need to determine the field ordering, length, endianness, etc. explicitly. Is there anything that can help make this less of a chore?

There is a library/tools called PADS which should be ideal for this. See this SO answer here, the project home page here, some GitHub-ish stuff here. There seems to be some Haskell related stuff here. I've just tried and succeeded in downloading PADS/C from the homepage (note that the download server's username and password are given at the bottom of their license agreement).
It's a bit like writing a Google Protocol Buffer schema, except you're specifying bits/bytes in an arbitrary data stream, which is what you have.
I tried to get PADS/ML downloaded from https://github.com/yitzhakm/PADS-ML working some time ago, but ran into a lot of trouble and ultimately failed.
As you're interested in C (which is about as close to C++ as you're going to get) you might try the PADS/C library.

Related

How to Encrypt a Folder Using C++?

I'm creating a program uysing c++ that relies off sensitive information contained within a folder located on my Ubuntu 14.04 desktop. I need some way to protect this information.
Essentially I have two buttons setup on my application. One to encrypt the folder and one to decrypt the folder. However, I have no experience with encryption and don't even know if you can encrypt a folder itself. Most tutorials I have found only talk about encrypting text. A friend recommended using AES encrytpion, but again, I can only find tutorials that show how to encrypt text.
Does anyone know of any way to protect these folders? They contain a large amount of images (.bmp and .png file types) concerning patient information along with a few text files. Obviously the quickest method would be best, as long as they aren't easily accessible without pressing the buttons.

Encryption is not some magic wand one can waive over some data, and encrypt it. If your application has a button that automatically "decrypts" the data, it means that anyone else can do it as well. For this button to work as you described, your application must logically know everything that's needed to decrypt the data. If so, a determined attacker can simply obtain a copy of your application, debug it, figure out how it decrypts the data, and game over.
At the very minimum, a passphrase will be required in order to decrypt the data; so that the application alone is not sufficient to effect encryption and decryption.
As far as the actual technology goes, the two primary software libraries on Linux that provide generic encryption facilities are OpenSSL and GnuTLS. Both provide comparable implementations of all standard symmetric and asymetric cipher-suites.
I believe that GnuTLS is a better API, and that's what I recommend. The design of GnuTLS's C API naturally lends itself to a light C++ OO wrapper facade. The GnuTLS library provides extensive documentation, so your first step is to read through the documentation; at which point you should have all sufficient information to implement encryption in your application.

Just a simple point.
You are going to have to make a blob, which you someway mount as a filesystem. You are also going to have to decide how to control access to that filesystem while people are using it. Also how people are going to synchronize access. Do it wrong and two people will write to the same area at the same time and create something that no one will ever decrypt!
Look at the source code for dm-crypt and TrueCrypt, but if you want to limit access beyond the permission system that your OS supports you may find yourself way in over your head.

you need build private filesystem,so every file operator must pass you application. you can encrypt the file contain to user.

"Best" Input File Formats for C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am starting work on a new piece of software that will end up needing some robust and expandable file IO. There are a lot of formats out there. XML, JSON, INI, etc. However, there are always plusses and minuses so I thought I would ask for some community input.
Here are some rough requirements:
The format is a "standard"...I don't want to reinvent the wheel if I don't have to. It doesn't have to be a formal IEEE standard, but something you could Google and get some information on as a new user, may have some support tools (editors) beyond vi. (Though the software users will generally be computer savvy and happy to use vi.)
Easily integrates with C++. I don't want to have to pull along a 100mb library and three different compilers to get it up and running.
Supports tabular input (2d, n-dimensional)
Supports POD types
Can expand as more inputs are required, binds well to variables, etc.
Parsing speed is not terribly important
Ideally, as easy to write (reflect) as it is to read
Works well on Windows and Linux
Supports compositing (one file referencing another file to read, and so on.)
Human Readable
In a perfect world, I would use a header-only library or some clean STL implementation, but I'm fine with leveraging Boost or some small external library if it works well.
So, what are your thoughts on various formats? Drawbacks? Advantages?
Edit
Options to consider? Anything else to add?
XML
YAML
SQLite
Google Protocol Buffers
Boost Serialization
INI
JSON

There is one excellent format that meets all your criteria:
SQLite!
Please read article about using SQLite as an application file format. Also, please watch Google Tech Talk by D. Richard Hipp (SQLite author) about this very topic.
Now, lets see how SQLite meets your requirements:
The format is a "standard"
SQLite has become format of choice for most mobile environments, and for many desktop apps (Firefox, Thunderbird, Google Chrome, Adobe Reader, you name it).
Easily integrates with C++
SQLite has standard C interface, which is only one source file and one header file. There are C++ wrappers too.
Supports tabular input (2d, n-dimensional)
SQLite table is as tabular as you could possibly imagine. To represent say 3-dimensional data, create table with columns x,y,z,value and store your data as a set of rows like this:
x1,y1,z1,value1
x2,y2,z2,value2
...
Supports POD types
I assume by POD you meant Plain Old Data, or BLOB. SQLite lets you store BLOB fields as is.
Can expand as more inputs are required, binds well to variables
This is where it really shines.
Parsing speed is not terribly important
But SQLite speed is superb. In fact, parsing is basically transparent.
Ideally, as easy to write (reflect) as it is to read
Just use INSERT to write and SELECT to read - what could be easier?
Works well on Windows and Linux
You bet, and all other platforms as well.
Supports compositing (one file referencing another file to read)
You can ATTACH one database to another.
Human Readable
Not in binary, but there are many excellent SQLite browsers/editors out there. I like SQLite Expert Personal on Windows and sqliteman on Linux. There is also SQLite editor plugin for Firefox.
There are other advantages that SQLite gives you for free:
Data is indexable which makes it very fast to search. You just cannot do this using XML, JSON or any other text-only formats.
Data can be edited partially, even when amount of data is very large. You do not have to rewrite few gigabytes just to edit one value.
SQLite is fully transactional: it guarantees that your data is consistent at all times. Even if your application (or whole computer) crashes, your data will be automatically restored to last known consistent state on next first attempt to connect to the database.
SQLite stores your data verbatim: you do not need to worry about escaping junk characters in your data (including zero bytes embedded in your strings) - simply always use prepared statements, that's all it takes to make it transparent. This can be big and annoying problem when dealing with text data formats, XML in particular.
SQLite stores all strings in Unicode: UTF-8 (default) or UTF-16. In other words, you do not need to worry about text encodings or international support for your data format.
SQLite allows you to process data in small chunks (row by row in fact), thus it works well in low memory conditions. This can be a problem for any text based formats, because often they need to load all text into memory to parse it. Granted, there are few efficient stream-based XML parsers out there, but in general any XML parser will be quite memory greedy compared to SQLite.

Having worked quite a bit with both XML and json, here's my rather subjective opinion of both as extendable serialization formats:
The format is a "standard": Yes for both
Easily integrates with C++: Yes for both. In each case you'll probably wind up with some kind of library to handle it. On Linux, libxml2 is a standard, and libxml++ is a C++ wrapper for it; you should be able to get both of those from your distro's package manager. It will take some small effort to get those working on Windows. There appears to be some support in Boost for json, but I haven't used it; I've always dealt with json using libraries. Really, the library route is not very onerous for either.
Supports tabular input (2d, n-dimensional): Yes for both
Supports POD types: Yes for both
Can expand as more inputs are required: Yes for both - that's one big advantage to both of them.
Binds well to variables: If what you mean is some way inside the file itself to say "This piece of data must be automatically deserialized into this variable in my program", then no for both.
As easy to write (reflect) as it is to read: Depends on the library you use, but in my experience yes for both. (You can actually do a tolerable job of writing json using printf().)
Works well on Windows and Linux: Yes for both, and ditto Mac OS X for that matter.
Supports one file referencing another file to read: If you mean something akin to a C #include, then XML has some ability to do this (e.g. document entities), while json doesn't.
Human readable: Both are typically written in UTF-8, and permit line breaks and indentation, and thus can be human-readable. However, I've just been working with a 479 KB XML file that's all on one line, so I had to run it through a prettyprinter to make sense of it. json can also be pretty unreadable, but in my experience is often formatted better than XML.
When starting new projects, I generally prefer json; it's more compact and more human-readable. The main reason I might select XML over json would be if I were worried about receiving badly-formed documents, since XML supports automated document format validation, while you have to write your own validation code with json.

Check out google buffers. This handles most of your requirements.
From their documentation, the high level steps are:
Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the C++ protocol buffer API to write and read messages.

For my purposes, I think the way to go is XML.
The format is a standard, but allows for modification and flexibility for the schema to change as the program requirements evolve.
There are several library options. Some are larger (Xerces-C) some are smaller (ezxml), but there are many options, so we won't be locked in to a single provider or very specific solution.
It can supports tabular input (2d, n-dimensional). This requires more parsing work on "our" end, and is likely the weakest point for XML.
Supports POD types: Absolutely.
Can expand as more inputs are required, binds well to variables, etc. through schema modifications and parser modifications.
Parsing speed is not terribly important, so processing a text file or files is not an issue.
XML can be programmatically written just as easily as read.
Works well on Windows and Linux or any other OS that supports C and text files.
Supports compositing (one file referencing another file to read, and so on.)
Human Readable with many text editors (Sublime, vi, etc.) supporting syntax highlighting out of the box. Many web browsers display the data well.
Thanks for all the great feedback! I think if we wanted a purely binary solution, Protocol Buffers or boost::serialization is likely the way that we would go.

prioritizing torrent download sequences using libtorrent

Suppose I have 2+ clients (developed by me) ALL using libtorrent ( http://www.rasterbar.com/products/libtorrent/manual.html#queuing )
Can I prioritize download of a file from other clients effectively so that they download the file's pieces/chunks (whatever is torrent terminology here) from beginning of the file towards its end and not quite in random order?
(of course I'm allowing some "multiplexing" / "intertwining" pieces for reasons of availability and performance, but the goal here is to download as linearly and quickly from the start of the file towards the end as possible)
The goal I'm thinking about here is obviously previewing the file quickly. How to do this most effectively using libtorrent / possibly other C++ torrent library?
(I'm not quite interested in torrent implementations using non-binary languages, like Java or Python - I need machine code for reasons of performance and security, so, C, C++ or possibly D would all fit the bill)

You can certainly prioritize pieces and files with torrent_handle::prioritize_pieces() and torrent_handle::prioritize_files(). See the documentation.
This won't be enough to download in-order though. To do that, you can enable sequential download with torrent_handle::set_sequential_download(). This will issue new piece requests in-order. Keep in mind that the time a request take to be satisfied varies a lot depending on which peer you talk to. Making the requests in-order does not necessarily mean receiving the pieces in order.
There is another mechanism to attempt to do that. torrent_handle::set_piece_deadline() is used to set a target completion time for a piece. Such pieces are considered time-critical pieces, and they are ordered by their deadline and the fastest peers are used to request blocks from those pieces, attempting to download them in deadline-order.
Now, I also get the impression that you want two separate clients (presumably running on different machines) to coordinate which pieces they download. Is that right? It's not entirely clear what you're asking about, but there's no simple way of asking libtorrent to do that.
You could write a plugin for libtorrent that implements a new extension message for these clients to chat and coordinate, which could de-select certain pieces the other client is downloading by setting their priority to 0.

Methods for encrypting an archive in C++

I'm writing a game that will have a lot of information (configuration, some content, etc) inside of some xml documents, as well as resource files. This will make it easier for myself and others to edit the program without having to edit the actual C++ files, and without having to recompile.
However, as the program is starting to grow there is an increase of files in the same directory as the program. So I thought of putting them inside a file archive (since they are mostly text, it goes great with compression).
My question is this: Will it be easier to compress all the files and:
Set a password to it (like a password-protected ZIP), then provide the password when the program needs it
Encrypt the archive with Crypto++ or similar
Modify the file header slightly as a "makeshift" encryption, and fix the file's headers while the file is loaded
I think numbers 1 and 2 are similar, but I couldn't find any information on whether zlib could handle password-protected archives.
Also note that I don't want the files inside the archive to be "extracted" into the folder while the program is using it. It should only be in the system's memory.

I think you misunderstands the possibilities brought up by encryption.
As long as the program is executed on an untrusted host, it's impossible to guarantee anything.
At most, you can make it difficult (encryption, code obfuscation), or extremely difficult (self-modifying code, debug/hooks detection), for someone to reverse engineer the code, but you cannot prevent cracking. And with Internet, it'll be available for all as soon as it's cracked by a single individual.
The same goes, truly, for preventing an individual to tamper with the configuration. Whatever the method (CRC, Hash --> by the way encryption is not meant to prevent tampering) it is still possible to reverse engineer it given sufficient time and means (and motivation).
The only way to guarantee an untampered with configuration would be to store it somewhere YOU control (a server), sign it (Asymmetric) and have the program checks the signature. But it would not, even then, prevent someone from coming with a patch that let's your program run with a user-supplied (unsigned) configuration file...
And you know the worst of it ? People will probably prefer the cracked version because freed from the burden of all those "security" measures it'll run faster...
Note: yes it is illegal, but let's be pragmatic...
Note: regarding motivation, the more clever you are with protecting the program, the more attractive it is to hackers --> it's like a brain teaser to them!
So how do you provide a secured service ?
You need to trust the person who executes the program
You need to trust the person who stores the configuration
It can only be done if you offer a thin client and executes everything on a server you trust... and even then you'll have trouble making sure that no-one finds doors in your server that you didn't thought about.
In your shoes, I'd simply make sure to detect light tampering with the configuration (treat it as hostile and make sure to validate the data before running anything). After all file corruption is equally likely, and if a corrupted configuration file meant a ruined client's machine, there would be hell to pay :)

If I had to choose among your three options, I'd go for Crypto++, as it fits in nicely with C++ iostreams.
But: you are
serializing your data to XML
compressing it
encrypting it
all in memory, and back again. I'd really reconsider this choice. Why not use eg. SQLite to store all your data in a file-based database (SQLite doesn't require any external database process)?
Encryption can be added through various extensions (SEE or SQLCipher). It's safe, quick, and completely transparent.
You don't get compression, but then again, by using SQLite instead of XML, this won't be an issue anyway (or so I think).

Set a password to it (like a password-protected ZIP), then provide the password when the program needs it
Firstly, you can't do this unless you are going to ask a user for the password. If that encryption key is stored in the code, don't bet on a determined reverse engineer from finding it and decrypting the archive.
The one big rule is: you cannot store encryption keys in your software, because if you do, what is the point of using encryption? I can find your key.
Now, onto other points. zlib does not support encryption and as they point out, PKZip is rather broken anyway. I suspect if you were so inclined to find one, you'd probably find a zip/compression library capable of handling encryption. (ZipArchive I believe handles Zip+AES but you need to pay for that).
But I second Daniel's answer that's just displayed on my screen. Why? Encryption/compression isn't going to give you any benefit unless the user presents some form of token (password, smartcard etc) not present in your compiled binary or related files. Similarly, if you're not using up masses of disk space, why compress?

How do I extract the network protocol from the source code of the server?

I'm trying to write a chat client for a popular network. The original client is proprietary, and is about 15 GB larger than I would like. (To be fair, others call it a game.)
There is absolutely no documentation available for the protocol on the internet, and most search results only come back with the client's scripting interface. I can understand that, since used in the wrong way, it could lead to ruining other people's experience.
I've downloaded the source code of a couple of alternative servers, including the one I want to connect to, but those
contain no documentation other than install instructions
are poorly commented (I did a superficial browsing)
are HUGE (the src folder of the target server contains 12 MB worth of .cpp and .h files), and grep didn't find anything related
I've also tried searching their forums and contacting the maintainers of the server, but so far, no luck.
Packet sniffing isn't likely to help, as the protocol relies heavily on encryption.
At this point, all my hope is my ability to chew through an ungodly amount of code. How do I start?
Edit: A related question.

If your original code is encrypted with some well known library like OpenSSL or Ctypto++ it might be useful to write your wrapper for the main entry points of these libraries, then delagating the call to the actual library. If you make such substitution and build the project successfully, you will be able to trace everything which goes out in the plain text way.
If your project is not using third party encryption libs, hopefully it is still possible to substitute the encryption routines with some wrappers which trace their input and then delegate encryption to the actual code.
Your bet is that usually enctyption is implemented in separate, relatively small number of source files so that should be easier for you to track input/output in these files.
Good luck!

I'd say
find the command that is used to send data through the socket (the call depends on the network library)
find references of this command and unroll from there. If you can modify-recompile the server code, it might help.
On the way, you will be able to log decrypted (or, more likely, not yet encrypted) network activity.

IMO, the best answer is to read the source code of the alternative server. Try using a good C++ IDE to help you. It will make a lot of difference.
It is likely that the protocol related material you need to understand will be limited to a subset of the files. These will contain references to network sockets and things. Start from there and work outwards as far as you need to.

A viable approach is to tackle this as a crypto challenge. That makes it easy, because you control so much.
For instance, you can use a current client to send a known message to the server, and then check server memory for that string. Once you've found out in which object the string ends, it also becomes possible to trace its ancestry through the code. Set a breakpoint on any non-const method of the object, and find the stacktraces. This gives you a live view of how messages arrive at the server, and a list of core functions essential to message processing. You can next find related functions (caller/callee of the functions on your list).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js