Protecting a program from unauthorised use/"crackers" - c++

I am writing a piece of software in C++ which is targeted at a market in which software is traditionally heavily cracked (or at least, attempted to be). I realise that nothing can be completely protected, however I feel that trying would be a good idea and also I think some of the specifics of the situation that I'm in might be helpful.
Firstly, it would not be annoying to the user that they must have an internet connection to use the software. I hate it when games etc. do this too, but the software requires an internet connection to function anyway due to its purpose, so this wouldn't hinder a normal user.
Secondly, it depends fairly heavily on external scripts written by me and/or supplied by third-parties, so I can have these stored on some website somewhere meaning that people who crack the software will have to also track down new copies of the scripts, which may annoy them into becoming legit.
Thirdly, new versions will, by definition due to what the app does, have to be released very often, weekly or every two weeks max. The program will obviously have an autoupdater, but since I am churning out (required to function) updates so often, any sort of key-based encryption or whatever could possibly have the keys/method change every update, and I am capable of breaking existing cracks when they do happen.
Does anyone know of any available solutions or techniques I could implement which fit the bill?

If you application is doing some sort of data processing or analysis, you can protect it by putting that part into a web service (maybe in a cloud) that your client application connects and authenticate to and then receive results from. So even if your client application is reversed engineered, it would be missing that important piece of processing.
If your application is web based, you get the same effect too.

I've previously used CrypKey successfully.

I'm going to guess that older copies of the software are far less useful than the latest copy.
If that's the case, then you already have a powerful anti-cracker technology in place: your update mechanism. When you become aware of a hacked version of your software, then you can immediately check for it, and cause trouble for users of the hacked software.

Related

Crossplatform library for uniquely identifying the machine my app is currently running on?

So I have next situation - shared file system, over N alike machines. My app is run on all of them. I need to understand on which machine my app runs in each instance - some unique ID... Is there such thing, is it possible to emulate it? Is there any crossplatform library that would help with that?
There are two concerns here, security and stability of your matching.
Hardware characteristics are a good place to start. Things like MAC address, CPU, hdd identifiers.
These things theoretically can change. If a hdd failed you probably would lose whatever configuration you had on the system as well. I could see a system that sent a hash of each characteristic separately work alright. If 4 out of 5 matched, you could probably guess that their network card caught on fire and it was replaced.
If you just need a head count, you may not even be interested that this new system with a different signature used to be another one.
Usually, people aren't too concerned with security with these systems; they just want to track resources on a network. If someone wanted to spoof the hardware identifiers they could. For simple cases, I would look into an installer that registered a salted identifier. If you really need something terribly secure you might start looking at commercial products (or ask another question about the security aspects specifically).
Both of these are error prone obviously. I'm not sure you should even fully automate it in those cases. Think about a case where network cards were behaving weird and you swapped them with another machine.
Human eyes are pretty good, let an administrator use them. At worst, they can probably figure things out with a quick email. Just give them enough information to make an informed decision when something does go wrong. Really, if you just log everything a human should be able to recreate the scenario and make a decision. Most of these things won't change daily. There is more work when hardware fails, not every day.

Embedded Lua - timing out rogue scripts (e.g. infinite loop) - an example anyone?

I have embedded Lua in a C++ application. I need to be able to kill rogue (i.e. badly written scripts) from hogging resources.
I know I will not be able to cater for EVERY type of condition that causes a script to run indefinitely, so for now, I am only looking at the straightforward Lua side (i.e. scripting side problems).
I also know that this question has been asked (in various guises) here on SO. Probably the reason why it is constantly being re-asked is that as yet, no one has provided a few lines of code to show how the timeout (for the simple cases like the one I described above), may actually be implemented in working code - rather than talking in generalities, about how it may be implemented.
If anyone has actually implemented this type of functionality in a C++ with embedded Lua application, I (as well as many other people - I'm sure), will be very grateful for a little snippet that shows:
How a timeout can be set (in the C++ side) before running a Lua script
How to raise the timeout event/error (C++ /Lua?)
How to handle the error event/exception (C++ side)
Such a snippet (even pseudocode) would be VERY, VERY useful indeed
You need to address this with a combination of techniques. First, you need to establish a suitable sandbox for the untrusted scripts, with an environment that provides only those global variables and functions that are safe and needed. Second, you need to provide for limitations on memory and CPU usage. Third, you need to explicitly refuse to load pre-compiled bytecode from untrusted sources.
The first point is straightforward to address. There is a fair amount of discussion of sandboxing Lua available at the Lua users wiki, on the mailing list, and here at SO. You are almost certainly already doing this part if you are aware that some scripts are more trusted than others.
The second point is question you are asking. I'll come back to that in a moment.
The third point has been discussed at the mailing list, but may not have been made very clearly in other media. It has turned out that there are a number of vulnerabilities in the Lua core that are difficult or impossible to address, but which depend on "incorrect" bytecode to exercise. That is, they cannot be exercised from Lua source code, only from pre-compiled and carefully patched byte code. It is straightforward to write a loader that refuses to load any binary bytecode at all.
With those points out of the way, that leaves the question of a denial of service attack either through CPU consumption, memory consumption, or both. First, the bad news. There are no perfect techniques to prevent this. That said, one of the most reliable approaches is to push the Lua interpreter into a separate process and use your platform's security and quota features to limit the capabilities of that process. In the worst case, the run-away process can be killed, with no harm done to the main application. That technique is used by recent versions of Firefox to contain the side-effects of bugs in plugins, so it isn't necessarily as crazy an idea as it sounds.
One interesting complete example is the Lua Live Demo. This is a web page where you can enter Lua sample code, execute it on the server, and see the results. Since the scripts can be entered anonymously from anywhere, they are clearly untrusted. This web application appears to be as secure as can be asked for. Its source kit is available for download from one of the authors of Lua.
Snippet is not a proper use of terminology for what an implementation of this functionality would entail, and that is why you have not seen one. You could use debug hooks to provide callbacks during execution of Lua code. However, interrupting that process after a timeout is non-trivial and dependent upon your specific architecture.
You could consider using a longjmp to a jump buffer set just prior to the lua_call or lua_pcall after catching a time out in a luaHook. Then close that Lua context and handle the exception. The timeout could be implemented numerous ways and you likely already have something in mind that is used elsewhere in your project.
The best way to accomplish this task is to run the interpreter in a separate process. Then use the provided operating system facilities to control the child process. Please refer to RBerteig's excellent answer for more information on that approach.
A very naive and simple, but all-lua, method of doing it, is
-- Limit may be in the millions range depending on your needs
setfenv(code,sandbox)
pcall (function() debug.sethook(
function() error ("Timeout!") end,"", limit)
code()
debug.sethook()
end);
I expect you can achieve the same through the C API.
However, there's a good number of problems with this method. Set the limit too low, and it can't do its job. Too high, and it's not really effective. (Can the chunk get run repeatedly?) Allow the code to call a function that blocks for a significant amount of time, and the above is meaningless. Allow it to do any kind of pcall, and it can trap the error on its own. And whatever other problems I haven't thought of yet. Here I'm also plain ignoring the warnings against using the debug library for anything (besides debugging).
Thus, if you want it reliable, you should probably go with RB's solution.
I expect it will work quite well against accidental infinite loops, the kind that beginning lua programmers are so fond of :P
For memory overuse, you could do the same with a function checking for increases in collectgarbage("count") at far smaller intervals; you'd have to merge them to get both.

Source code protection at Microsoft

Here's another question about source code protection... So far I haven't been convinced with the answers to similar questions found on this site (NDAs on the legal side, trusting employees vs. protected code, etc.) So I'd like to formulate it in a different manner:
How do large companies do to protect their source code? E.g. I have never heard that the Windows, MS-DOS source code was ever stolen, reverse engineered? What steps does a large company like Microsoft take to protect their code?
One very important factor is that working with complex source code requires solid domain knowledge. So complex code becomes largely useless without the people that wrote it. Even if some third party steals all the code it will likely be unable to make alterations to it or use it.
One good example is SQLite - all its code is public domain and published. How much time will someone without solid knowledge of its inner workings need to make any alterations or analysis of that code? And SQLite is not a very big piece of software. Yet people developing it support it and publish updates all the time.
I have never heard that the Windows,
MS-DOS source code was ever stolen,
reverse engineered?
Well, than you haven't been listening very carefully. Reverse engineering Microsoft's operating system code happens all the time. Go read books like "Undocumented Windows 2000 Secrets: A Programmer's Cookbook" or "Windows NT/2000 Native API Reference" by Gary Nebbet.
Or remember what Cogswell and Russinovich did before being bought by Microsoft.
Also, around 6 years ago, (parts of) the source code of Windows 2000 was leaked:
http://www.wired.com/science/discoveries/news/2004/02/62282
First, they pay enough and have big enough legal and security teams to make it not worth it for most employees to think of taking the risk of leaking it. Second, they limit the access to their source control systems based on the portions of the codebase that particular developers need access to.

Does three-tier architecture ever work?

We are building three-tier architectures for over a decade now. Dividing presentation-, logic- and data-tier is supposed to allow us to exchange each layer individually, should the need ever arise, be it through changed requirements or new technologies.
I have never seen it working in practice...
Mostly because (at least) one of the following reasons:
The three tiers concept was only visible in the source code (e.g. package naming in Java) which was then deployed as one, tied together package.
The code representing each layer was nicely bundled in its own deployable format but then thrown into the same process (e.g. an "enterprise container").
Each layer was run in its own process, sometimes even on different machines but through the static nature they were connected to each other, replacing one of them meant breaking all of them.
Thus what you usually end up with, in is a monolithic, tightly coupled system that does not deliver what it's architecture promised.
I therefore think "three-tier architecture" is a total misnomer. The true benefit it brings is that the code is logically sound. But that's at "write time", not at "run time". A better name would be something like "layered by responsibility". In any case, the "architecture" word is misleading.
What are your thoughts on this? How could working three-tier architecture be achieved? By that I mean one which holds its promises: Allowing to plug out a layer without affecting the other ones. The system should survive that and be in a well defined state afterwards.
Thanks!
The true purpose of layered architectures (both logical and physical tiers) isn't to make it easy to replace a layer (which is quite rare), but to make it easy to make changes within a layer without affecting the others (and as Ben notes, to facilitate scalability, consistency, and security) - which works all the time all around us.
One example of a 3-tier architecture is a typical database-driven web application:
End-user's web browser
Server-side web application logic
Database engine
In every system, there is the nice, elegant architecture dreamed up at the beginning, then the hairy mess when its finally in production, full of hundreds of bug fixes and special case handlers, and other typical nasty changes made to address specific issues not realized during the design.
I don't think the problems you've described are specific to three-teir architecture at all.
If you haven't seen it working, you may just have bad luck. I've worked on projects that serve several UIs (presentation) from one web service (logic). In addition, we swapped data providers via configuration (data) so we could use a low-cost database while developing and Oracle in higher environments.
Sure, there's always some duplication - maybe you add validation in the UI for responsiveness and then validate again in the logic layer - but overall, a clean separation is possible and nice to work with.
Once you accept that n-tier's major benefits--namely scalability, logical consistency, security--could not easily be achieved through other means, the question of whether or not any of the tiers can be replaced outright without breaking the the others becomes more like asking whether there's any icing on the cake.
Any operating system will have a similar kind of architecture, or else it won't work. The presentation layer is independent of the hardware layer, which is abstracted into drivers that implement a certain interface. The data is handled using logic that changes depending on the type of data being read (think NTFS vs. FAT32 vs. EXT3 vs. CD-ROM). Linux can run on just about any hardware you can throw at it and it will still look and behave the same because the abstractions between the layers insulate each other from changes within a single layer.
One of the biggest practical benefits of the 3-tier approach is that it makes it easy to split up work. You can easily have a DBA and a business anylist or two building a data layer, a traditional programmer building the server side app code, and a graphic designer/ web designer building the UI. The three teams still need to communicate, of course, but this allows for much smoother development in most cases. In this regard, I see the 3-tier approach working reliably everyday, and this enough for me, even if I cannot count on "interchangeable parts", so to speak.

Machine ID for Mac OS?

I need to calculate a machine id for computers running MacOS, but I don't know where to retrieve the informations - stuff like HDD serial numbers etc. The main requirement for my particular application is that the user mustn't be able to spoof it. Before you start laughing, I know that's far fetched, but at the very least, the spoofing method must require a reboot.
The best solution would be one in C/C++, but I'll take Objective-C if there's no other way. The über-best solution would not need root privileges.
Any ideas? Thanks.
Erik's suggestion of system_profiler (and its underlying, but undocumented SystemProfiler.framework) is your best hope. Your underlying requirement is not possible, and any solution without hardware support will be pretty quickly hackable. But you can build a reasonable level of obfuscation using system_profiler and/or SystemProfiler.framework.
I'm not sure your actual requirements here, but these posts may be useful:
Store an encryption key in Keychain while application installation process (this was related to network authentication, which sounds like your issue)
Obfuscating Cocoa (this was more around copy-protection, which may not be your issue)
I'll repeat here what I said in the first posting: It is not possible, period, not possible, to securely ensure that only your client can talk to your server. If that is your underlying requirement, it is not a solvable problem. I will expand that by saying it's not possible to construct your program such that people can't take out any check you put in, so if the goal is licensing, that also is not a completely solvable problem. The second post above discusses how to think about that problem, though, from a business rather than engineering point of view.
EDIT: Regarding your request to require a reboot, remember that Mac OS X has kernel extensions. By loading a kernel extension, it is always possible to modify how the system sees itself at runtime without a reboot. In principle, this would be a Mac rootkit, which is not fundamentally any more complex than a Linux rootkit. You need to carefully consider who your attacker is, but if your attackers include Mac kernel hackers (which is not an insignificant group), then even a reboot requirement is not plausible. This isn't to say that you can't make spoofing annoying for the majority of users. It's just always possible by a reasonably competent attacker. This is true on all modern OSes; there's nothing special here about Mac.
The tool /usr/sbin/system_profiler can provide you with a list of serial numbers for various hardware components. You might consider using those values as text to generate an md5 hash or something similar.
How about getting the MAC ID of a network card attached to a computer using ifconfig?