Protection against cracking - specifically ways to make a program harder to decompile - c++

I'm making a commercial product that will have a client and server side. The client is totally dependent on the server , just to make it harder to crack/pirate . Problem is , even so there is a chance that someone will reverse engineer the protocol and make their own server.
I've thought about encrypting the connection either with ssl or with another algorithm so it won't be so easy to figure out the protocol just from sniffing the traffic between the client and the server.
Now the only thing I can think of that pirates would use is to decompile the program, remove the encryption and try to see the "plain text" protocol in order to reverse engineer it.
I have read previous topics and I know that it's impossible to make it impossible to crack , but what tweaks can we programmers bring to our code to make it a huge headache for crackers?

Read how Skype did it:
The binary is decrypted into memory at startup.
The import table is overwritten.
The startup code is erased from memory.
Code integrity checks bust most debuggers: in random points in the code it computes a checksum of some other chunk of code and uses the checksum for an indirect jump to the next instruction. (Explanation: most debuggers implement breakpoints by changing the instruction at the breakpoint address. This check detects that.)
If debugger is detected -- it scrambles the registers and jumps to a random page.
Obfuscates code: call destination addresses are dynamically computed; dummy branches that are never executed; raises SEH where the handler sets some registers and resumes execution.
Keep in mind that these or other techniques would make reverse engineering harder, but not impossible. Also you shall never rely on any of these for security.
IMO your best option is to design your servers to provide some useful functionality (SaS). Your clients will essentially be paying for using that functionality. If your client-app is dumb enough, you won't care about it being open-source.

One thing you need to be aware of is that most packers/cryptors cause false positives with virus scanners. And that can be pretty annoying because people complain all the time that your software contains a virus(they don't get the concept of false positives).
And for protocol-obfuscation don't use SSL. It is trivial for an attacker to intercept the plaintext when you call Send with the plain-text. Use SSL for securing the connection and obfuscate the data before sending them. The obfuscation algorithm doesn't need to be cryptographically secure.

This might be helpful: http://www.woodmann.com/crackz/Tutorials/Protect.htm

IMHO, it's difficult to hide the actual plain code. What most packers do is to make it difficult to patch. However, in your case, Themida could do the trick.
Here are some nice tips about writing a good protection: http://www.inner-smile.com/nocrack.phtml

Related

Speed of HTTP GET Method in C++

I have been using certain libraries in a C++ program to connect and fetch different websites. Mainly I used Chillkat and Curl. However, recently I started programming my own HTTP fetcher, using the help of MSDN and the Winsocket2 library.
I programmed my software to open a socket with SOCKET_STREAM type and for Ipv4,
and then I establish a connection with the required website, and send a GET request with "Host:" and "Connection: close" headers to the server.
Everything seems to work fine, However, the performance is not as I expected. The bundled Chillkat library still preforms better then mine. Even though I have optimized mine as much as I can.
I notice that when I send the request, some servers take longer time to respond. And once they do they send everything at once chunked. So how can I make a header-request that initiates a fast response? Speed matters a lot for my program.
If you are seeing performance differences on a modern machine with low volumes, the most likely problem is that you have forgotten to turn off the Nagle algorithm. Use setsockopt() to set TCP_NODELAY to 1. HTTP is not Telnet.
I wouldn't worry about explicit flushing or buffer management or anything like that until you see a performance problem and you have enough volume to notice. Other than writing your request in a single write call.
For download speed, window size makes a difference. You can tune SO_SNDBUF and SO_RCVBUF. Bear in mind that the values that make your benchmarks go fast might make your real-world performance slow.
Honestly, HTTP is a complex standard, and there are many ways to optimize an implementation. However, the chances of your having enough time to optimize it better than an already packaged library such as Chillkat or Curl is highly unlikely. If you did want to go about it, I would suggest reducing the number of headers you send, and flushing the socket buffer (bypassing Nagle's algorithm) after writing the status line to the socket. This will give a properly coded server slightly longer (several ms tops) to respond to your request. But even that may blow up in your face if your network configuration is not "ideal".
Finally, keep in mind that when it comes to networks, there is a very large margin of error, and you may get different results using different tactics with different servers, networks, and even OSs.

Permanent Memory Address

With my basic knowledge of C++, I've managed to whip together a simple program that reads some data from a program (using ReadProcessMemory) and sends it to my web server every five minutes, so I can see the status of said program while I'm not at home.
I found the memory addresses to read from using a program designed to hack games called "Memory Hacking Software." The problem is, the addresses change whenever I move the program to another machine.
My question is: is there a way to find a 'permanent' address that is the same on any machine? Or is this simply impossible. Excuse me if this is a dumb question, but I don't know a whole lot on the subject. Or perhaps another means to access information from a running program.
Thanks for any and all help!
There are ways to do it such as being able to recognise memory patterns around the thing you're looking for. Crackers can use this to find memory locations to patch even with software that "moves around", so to speak (as with operating systems that provide randomisation of address spaces).
For example, if you know that there are fixed character strings always located X bytes beyond the area of interest, you can scan the whole address space to find them, then calculate the area of interest from that.
However, it's not always as reliable as you might think.
I would instead be thinking of another way to achieve your ends, one that doesn't involve battling the features that are protecting such software from malicious behaviour.
Think of questions like:
Why exactly do you need access to the address space at all?
Does the program itself provide status information in a more workable manner?
If the program is yours, can you modify it to provide that information?
If you only need to know if the program is doing its job, can you simply "ping" the program (e.g., for a web page, send an HTML request and ensure you get a valid response)?
As a last resort, can you convince the OS to load your program without address space randomisation then continue using your (somewhat dubious) method?
Given your comment that:
I use the program on four machines and I have to "re-find" the addresses (8 of them) on all of them every time they update the program.
I would simply opt for automating this process. This is what some cracking software does. It scans files or in-memory code and data looking for markers that it can use for locating an area of interest.
If you can do it manually, you should be able to write a program that can do it. Have that program locate the areas of interest (by reading the process address space) and, once they're found, just read your required information from there. If the methods of finding them changes with each release (instead of just the actual locations), you'll probably need to update your locator routines with each release of their software but, unfortunately, that's the price you pay for the chosen method.
It's unlikely the program you're trying to read will be as secure as some - I've seen some move their areas of interest around as the program is running, to try and confuse crackers.
What you are asking for is impossible by design. ASLR is designed specifically to prevent this kind of snooping.
What kind of information are you getting from the remote process?
Sorry, this isn't possible. The memory layout of processes isn't going to be reliably consistent.
You can achieve your goal in a number of ways:
Add a client/server protocol that you can connect to and ask "what's your status?" (this also lends itself nicely to asking for more info).
Have the process periodically touch a file, the "monitor" can check the modification time of that file to see if the process is dead.

How to get a debug flow of execution in C++

I work on a global trading system which supports many users. Each user can book,amend,edit,delete trades. The system is regulated by a central deal capture service. The deal capture service informs all the user of any updates that occur.
The problem comes when we have crashes, as the production environment is impossible to re-create on a test system, I have to rely on crash dumps and log files.
However this doesn't tell me what the user has been doing.
I'd like a system that would (at the time of crashing) dump out a history of what the user has been doing. Anything that I add has to go into the live environment so it can't impact performance too much.
Ideas wise I was thinking of a MACRO at the top of each function which acted like a stack trace (only I could supply additional user information, like trade id's, user dialog choices, etc ..) The system would record stack traces (on a per thread basis) and keep a history in a cyclic buffer (varying in size, depending on how much history you wanted to capture). Then on crash, I could dump this history stack.
I'd really like to hear if anyone has a better solution, or if anyone knows of an existing framework?
Thanks
Rich
Your solution sounds pretty reasonable, though perhaps rather than relying on viewing your audit trail in the debugger you can trigger it being printed with atexit() handlers. Something as simple as a stack of strings that have __FILE__,__LINE__,pthread_self() in them migth be good enough
You could possibly use some existing undo framework, as its similar to an audit trail, but it's going to be more heavyweight than you want. It will likely be based on the command pattern and expect you to implement execute() methods, though I suppose you could just leave them blank.
Trading systems usually don't suffer the performance hit of instrumentation of that level. C++ based systems, in particular, tend to sacrifice the ease of debugging for performance. Otherwise, more companies would be developing such systems in Java/C#.
I would avoid an attempt to introduce stack traces into C++. I am also not confident that you could introduce such a system in a way that would not affect the behavior of the program in some way (e.g., affect threading behavior).
It might, IMHO, be preferable to log the external inputs (e.g., user GUI actions and message traffic) rather than attempt to capture things internally in the program. In that case, you might have a better chance of replicating the failure and debugging it.
Are you currently logging all network traffic to/from the client? Many FIX based systems record this for regulatory purposes. Can you easily log your I/O?
I suggest creating another (circular) log file that contains your detailed information. Beware that this file will grow exponentially compared to other files.
Another method is to save the last N transactions. Write a program that reads the transaction log and feeds the data into your virtual application. This may help create the cause. I've used this technique with embedded systems before.

Options for a message passing system for a game

I'm working on an RTS game in C++ targeted at handheld hardware (Pandora). For reference, the Pandora has a single ARM processor at ~600Mhz and runs Linux. We're trying to settle on a good message passing system (both internal and external), and this is new territory for me.
It may help to give an example of a message we'd like to pass. A unit may make this call to load its models into memory:
sendMessage("model-loader", "load-model", my_model.path, model_id );
In return, the unit could expect some kind of message containing a model object for the particular model_id, which can then be passed to the graphics system. Please note that this sendMessage function is in no way final. It just reflects my current understanding of message passing systems, which is probably not correct :)
From what I can tell there are two pretty distinct choices. One is to pass messages in memory, and only pass through the network when you need to talk to an external machine. I like this idea because the overhead seems low, but the big problem here is it seems like you need to make extensive use of mutex locking on your message queues. I'd really like to avoid excess locking if possible. I've read a few ways to implement simple queues without locking (by relying on atomic int operations) but these assume there is only one reader and one writer for a queue. This doesn't seem useful to our particular case, as an object's queue will have many writers and one reader.
The other choice is to go completely over the network layer. This has some fun advantages like getting asynchronous message passing pretty much for free. Also, we gain the ability to pass messages to other machines using the exact same calls as passing locally. However, this solution rubs me the wrong way, probably because I don't fully understand it :) Would we need a socket for every object that is going to be sending/receiving messages? If so, this seems excessive. A given game will have thousands of objects. For a somewhat underpowered device like the Pandora, I fear that abusing the network like that may end up being our bottleneck. But, I haven't run any tests yet, so this is just speculation.
MPI seems to be popular for message passing but it sure feels like overkill for what we want. This code is never going to touch a cluster or need to do heavy calculation.
Any insight into what options we have for accomplishing this is much appreciated.
The network will be using locking as well. It will just be where you cannot see it, in the OS kernel.
What I would do is create your own message queue object that you can rewrite as you need to. Start simple and make it better as needed. That way you can make it use any implementation you like behind the scenes without changing the rest of your code.
Look at several possible implementations that you might like to do in the future and design your API so that you can handle them all efficiently if you decide to implement in those terms.
If you want really efficient message passing look at some of the open source L4 microkernels. Those guys put a lot of time into fast message passing.
Since this is a small platform, it might be worth timing both approaches.
However, barring some kind of big speed issue, I'd always go for the approach that is simpler to code. That is probably going to be using the network stack, as it will be the same code no matter where the recipient is, and you won't have to manually code and degug your mutual exclusions, message buffering, allocations, etc.
If you find out it is too slow, you can always recode the local stuff using memory later. But why waste the time doing that up front if you might not have to?
I agree with Zan's recommendation to pass messages in memory whenever possible.
One reason is that you can pass complex objects C++ without needing to marshal and unmarshal (serialize and de-serialize) them.
The cost of protecting your message queue with a semaphore is most likely going to be less than the cost of making networking code calls.
If you protect your message queue with some lock-free algorithm (using atomic operations as you alluded to yourself) you can avoid a lot a context switches into and out of the kernel.

Protecting API Secret Keys in a Thick Client application

Within an application, I've got Secret Keys uses to calculate a hash for an API call. In a .NET application it's fairly easy to use a program like Reflector to pull out information from the assembly to include these keys.
Is obfuscating the assembly a good way of securing these keys?
Probably not.
Look into cryptography and Windows' built-in information-hiding mechanisms (DPAPI and storing the keys in an ACL-restricted registry key, for example). That's as good as you're going to get for security you need to keep on the same system as your application.
If you are looking for a way to stop someone physically sitting at the machine from getting your information, forget it. If someone is determined, and has unrestricted access to a computer that is not under your control, there is no way to be 100% certain that the data is protected under all circumstances. Someone who is determined will get at it if they want to.
I wouldn't think so, as obfuscating (as I understand it at least) will simply mess around with the method names to make it hard (but not impossible) to understand the code. This won't change the data of the actual key (which I'm guessing you have stored in a constant somewhere).
If you just want to make it somewhat harder to see, you could run a simple cipher on the plaintext (like ROT-13 or something) so that it's at least not stored in the clear in the code itself. But that's certainly not going to stop any determined hacker from accessing your key. A stronger encryption method won't help because you'd still need to store the key for THAT in the code, and there's nothing protecting that.
The only really secure thing I can think of is to keep the key outside of the application somehow, and then restrict access to the key. For instance, you could keep the key in a separate file and then protected the file with an OS-level user-based restriction; that would probably work. You could do the same with a database connection (again, relying on the user-based access restriction to keep non-authorized users out of the database).
I've toyed with the idea of doing this for my apps but I've never implemented it.
DannySmurf is correct that you can't hide keys from the person running an application; if the application can get to the keys, so can the person.
However, What you are trying to accomplish exactly?
Depending on what it is, there are often ways to accomplish your goal that don't simply rely on keeping a secret "secret", on your user's machine.
Late to the game here...
The approach of storing the keys in the assembly / assembly config is fundamentally insecure. There is no possible ironclad way to store it as a determined user will have access. I don't care if you use the best / most expensive obfuscation product on the planet. I don't care if you use PDAPI to secure the data (although this is better). I don't care if you use a local OS-protected key store (this is even better still). None are ideal as all suffer from the same core issue: the user has access to the keys, and they are there, unchanging for days, weeks, possibly even months and years.
A far more secure approach would be is to secure your API calls with tried and true PKI. However, this has obvious performance overhead if your API calls are chatty, but for the vast majority of applications this is a non-issue.
If performance is a concern, you can use Diffie-Hellman over asymmetric PKI to establish a shared secret symmetric key for use with a cipher such as AES. "shared" in this case means shared between client and server, not all clients / users. There is no hard-coded / baked in key. Anywhere.
The keys are transient, regenerated every time the user runs the program, or if you are truly paranoid, they could time-out and require recreation.
The computed shared secret symmetric keys themselves get stored in memory only, in SecureString. They are hard to extract, and even if you do, they are only good for a very short time, and only for communication between that particular client (ie that session). In other words, even if somebody does hack their local keys, they are only good for interfering with local communication. They can't use this knowledge to affect other users, unlike a baked-in key shared by all users via code / config.
Furthermore, the entire keys themselves are never, ever passed over the network. The client Alice and server Bob independently compute them. The information they pass in order to do this could in theory be intercepted by third party Charlie, allowing him to independently calculate the shared secret key. That is why you use that (significantly more costLy) asymmetric PKI to protect the key generation between Alice and Bob.
In these systems, the key generation is quite often coupled with authentication and thus session creation. You "login" and create your "session" over PKI, and after that is complete, both the client and the server independently have a symmetric key which can be used for order-of-magnitude faster encryption for all subsequent communication in that session. For high-scale servers, this is important to save compute cycles on decryption over using say TLS for everything.
But wait: we're not secure yet. We've only prevented reading the messages.
Note that it is still necessary to use a message digest mechanism to prevent man-in-the-middle manipulation. While nobody can read the data being transmitted, without a MD there is nothing preventing them from modifying it. So you hash the message before encryption, then send the hash along with the message. The server then re-hashes the payload upon decryption and verifies that it matches the hash that was part of the message. If the message was modified in transit, they won't, and the entire message is discarded / ignored.
The final mechanism needed to guard against is replay attacks. At this point, you have prevented people from reading your data, as well as modifying your data, but you haven't prevented them from simply sending it again. If this is a problem for your application, it's protocol must provide data and both client and server must have enough stateful information to detect a replay. This could be something as simple as a counter that is part of the encrypted payload. Note that if you are using a transport such as UDP, you probably already have a mechanism to deal with duplicated packets, and thus can already deal with replay attacks.
What should be obvious is getting this right is not easy. Thus, use PKI unless you ABSOLUTELY cannot.
Note that this approach is used heavily in the games industry where it is highly desirable to spend as little compute on each player as possible to achieve higher scalability, while at the same time providing security from hacking / prying eyes.
So in conclusion, if this is really something that is a concern, instead of trying to find a securely store the API keys, don't. Instead, change how your app uses this API (assuming you have control of both sides, naturally). Use a PKI, or use a PKI-shared symmetric hybrid if PKI will be too slow (which is RARELY a problem these days). Then you won't have anything stored that is a security concern.