Related
I have embedded Lua in a C++ application. I need to be able to kill rogue (i.e. badly written scripts) from hogging resources.
I know I will not be able to cater for EVERY type of condition that causes a script to run indefinitely, so for now, I am only looking at the straightforward Lua side (i.e. scripting side problems).
I also know that this question has been asked (in various guises) here on SO. Probably the reason why it is constantly being re-asked is that as yet, no one has provided a few lines of code to show how the timeout (for the simple cases like the one I described above), may actually be implemented in working code - rather than talking in generalities, about how it may be implemented.
If anyone has actually implemented this type of functionality in a C++ with embedded Lua application, I (as well as many other people - I'm sure), will be very grateful for a little snippet that shows:
How a timeout can be set (in the C++ side) before running a Lua script
How to raise the timeout event/error (C++ /Lua?)
How to handle the error event/exception (C++ side)
Such a snippet (even pseudocode) would be VERY, VERY useful indeed
You need to address this with a combination of techniques. First, you need to establish a suitable sandbox for the untrusted scripts, with an environment that provides only those global variables and functions that are safe and needed. Second, you need to provide for limitations on memory and CPU usage. Third, you need to explicitly refuse to load pre-compiled bytecode from untrusted sources.
The first point is straightforward to address. There is a fair amount of discussion of sandboxing Lua available at the Lua users wiki, on the mailing list, and here at SO. You are almost certainly already doing this part if you are aware that some scripts are more trusted than others.
The second point is question you are asking. I'll come back to that in a moment.
The third point has been discussed at the mailing list, but may not have been made very clearly in other media. It has turned out that there are a number of vulnerabilities in the Lua core that are difficult or impossible to address, but which depend on "incorrect" bytecode to exercise. That is, they cannot be exercised from Lua source code, only from pre-compiled and carefully patched byte code. It is straightforward to write a loader that refuses to load any binary bytecode at all.
With those points out of the way, that leaves the question of a denial of service attack either through CPU consumption, memory consumption, or both. First, the bad news. There are no perfect techniques to prevent this. That said, one of the most reliable approaches is to push the Lua interpreter into a separate process and use your platform's security and quota features to limit the capabilities of that process. In the worst case, the run-away process can be killed, with no harm done to the main application. That technique is used by recent versions of Firefox to contain the side-effects of bugs in plugins, so it isn't necessarily as crazy an idea as it sounds.
One interesting complete example is the Lua Live Demo. This is a web page where you can enter Lua sample code, execute it on the server, and see the results. Since the scripts can be entered anonymously from anywhere, they are clearly untrusted. This web application appears to be as secure as can be asked for. Its source kit is available for download from one of the authors of Lua.
Snippet is not a proper use of terminology for what an implementation of this functionality would entail, and that is why you have not seen one. You could use debug hooks to provide callbacks during execution of Lua code. However, interrupting that process after a timeout is non-trivial and dependent upon your specific architecture.
You could consider using a longjmp to a jump buffer set just prior to the lua_call or lua_pcall after catching a time out in a luaHook. Then close that Lua context and handle the exception. The timeout could be implemented numerous ways and you likely already have something in mind that is used elsewhere in your project.
The best way to accomplish this task is to run the interpreter in a separate process. Then use the provided operating system facilities to control the child process. Please refer to RBerteig's excellent answer for more information on that approach.
A very naive and simple, but all-lua, method of doing it, is
-- Limit may be in the millions range depending on your needs
setfenv(code,sandbox)
pcall (function() debug.sethook(
function() error ("Timeout!") end,"", limit)
code()
debug.sethook()
end);
I expect you can achieve the same through the C API.
However, there's a good number of problems with this method. Set the limit too low, and it can't do its job. Too high, and it's not really effective. (Can the chunk get run repeatedly?) Allow the code to call a function that blocks for a significant amount of time, and the above is meaningless. Allow it to do any kind of pcall, and it can trap the error on its own. And whatever other problems I haven't thought of yet. Here I'm also plain ignoring the warnings against using the debug library for anything (besides debugging).
Thus, if you want it reliable, you should probably go with RB's solution.
I expect it will work quite well against accidental infinite loops, the kind that beginning lua programmers are so fond of :P
For memory overuse, you could do the same with a function checking for increases in collectgarbage("count") at far smaller intervals; you'd have to merge them to get both.
If you've got a codebase which is a bit messy in respect to coding standards - a mix of different conventions from different people - is it reasonable to give one person the task of going through every file and bringing it up to meet standards?
As well as being tremendously dull, you're going to get a mass of changes in SVN (or whatever) which can make comparing versions harder. Is it sensible to set someone on the whole codebase, or is it considered stupid to touch a file only to make it meet standards? Should files be left alone until some 'real' change is needed, and then updated?
Tagged as C++ since I think different languages have different automated tools for this.
Should files be left alone until some 'real' change is needed, and then updated?
This is what I would do.
Even if it's primarily text layout changes, doing it by a manual process on a large scale risks breaking code that was working.
Treat it as a refactor and do it locally whenever code has to be touched for some other reason. Add tests if they're missing to improve your chances of not breaking the code.
If your code is already well covered by tests, you might get away with something global, but I still wouldn't advocate it.
I also think this is pretty much language-agnostic.
It also depends on what kind of changes you are planning to make in order to bring it up to your coding standard. Everyone's definition of coding standard is different.
More specifically:
Can your proposed changes be made to the project with 100% guarantee that the entire project will work identically the same as before? For example, changes that only affect comments, line breaks and whitespaces should be fine.
If you do not have 100% guarantee, then there is a risk that should not be taken unless it can be balanced with a benefit. For example, is there a need to gain a deeper understanding of the current code base in order to continue its development, or fix its bugs? Is the jumble of coding conventions preventing these initiatives? If so, evaluate the costs and benefits and decide whether a makeover is justified.
If you need to understand the current code base, here is a technique: tracing.
Make a copy of the code base. Note that tracing involves adding code, so it should not be performed on the production copy.
In the new copy, insert many fprintf (trace) statements into any functions considered critical. It may be possible to automate this.
Run the project with various inputs and collect those tracing results. This will help everyone understand the current project's design.
Another technique for understanding the current code base is to document the dependencies in the project.
Some kinds of dependencies (interface dependency, C++ include dependency, C++ typedef / identifier dependency) can be extracted by automated tools.
Run-time dependency can only be extracted through tracing, or by profiling tools.
I was thinking it's a task you might give a work-experience kid or put out onto RentaCoder
This depends mainly on the codebase's size.
I've seen three trainees given the task to go through a 2MLoC codebase (several thousand source files) in order to insert one new line into the standard disclaimer at the top of all the source files (with the line's content depending on the file's name and path). It took them several days. One of the three used most of that time to write a script that would do it and later only fixed the files where the script had failed to insert the line correctly, the other two just ploughed through the files. (The one who wrote the script later got a job at that company.)
The job of manually adapting all those files in that codebase to certain coding standards would probably have to be measured in man-years.
OTOH, if it's just a few dozen files, it's certainly doable.
Your codebase is very likely somewhere in between, so your best bet might be to set a "work-experience kid" to find out whether there's a tool that can do this to your satisfaction and, if so, make it work.
Should files be left alone until some 'real' change is needed, and then updated?
I'd strongly advice against this. If you do this, you will have "real" changes intermingled with whatever reformatting took place, making it nigh impossible to see the "real" changes in the diff.
You can address the formatting aspect of coding style fairly easily. There are a number of tools that can auto-format your code. I recommend hooking one of these up to your version control tool's "check in" feature. This way, people can use whatever format they want while editing their code, but when it gets checked in, it's reformatted to the official style.
In general, I think it's best if you can do the big change all at once. In the past, we've done the following:
1. have a time dedicated to the reformatting when most people aren't working (e.g. at night or on the weekend
2. have a person check out as many files as possible at that time, reformat them, and check them in again
With a reformatting-only revision, you don't have to figure out what has changed in addition to the formatting.
I am writing a piece of software in C++ which is targeted at a market in which software is traditionally heavily cracked (or at least, attempted to be). I realise that nothing can be completely protected, however I feel that trying would be a good idea and also I think some of the specifics of the situation that I'm in might be helpful.
Firstly, it would not be annoying to the user that they must have an internet connection to use the software. I hate it when games etc. do this too, but the software requires an internet connection to function anyway due to its purpose, so this wouldn't hinder a normal user.
Secondly, it depends fairly heavily on external scripts written by me and/or supplied by third-parties, so I can have these stored on some website somewhere meaning that people who crack the software will have to also track down new copies of the scripts, which may annoy them into becoming legit.
Thirdly, new versions will, by definition due to what the app does, have to be released very often, weekly or every two weeks max. The program will obviously have an autoupdater, but since I am churning out (required to function) updates so often, any sort of key-based encryption or whatever could possibly have the keys/method change every update, and I am capable of breaking existing cracks when they do happen.
Does anyone know of any available solutions or techniques I could implement which fit the bill?
If you application is doing some sort of data processing or analysis, you can protect it by putting that part into a web service (maybe in a cloud) that your client application connects and authenticate to and then receive results from. So even if your client application is reversed engineered, it would be missing that important piece of processing.
If your application is web based, you get the same effect too.
I've previously used CrypKey successfully.
I'm going to guess that older copies of the software are far less useful than the latest copy.
If that's the case, then you already have a powerful anti-cracker technology in place: your update mechanism. When you become aware of a hacked version of your software, then you can immediately check for it, and cause trouble for users of the hacked software.
I will soon be shipping a paid-for static library, and I am wondering if it is possible to build in any form of copy protection to prevent developers copying the library.
Ideally, I would like to prevent the library being linked into an executable at all, if (and only if!) the library has been illegitimately copied onto the developer's machine. Is this possible?
Alternatively, it might be acceptable if applications linked to an illegitimate copy of the library simply didn't work; however, it is very important that this places no burden on the users of these applications (such as inputting a license key, using a dongle, or even requiring an Internet connection).
The library is written in C++ and targets a number of platforms including Windows and Mac.
Do I have any options?
I agree with other answers that a fool-proof protection is simply impossible. However, as a gentle nudge...
If your library is precompiled, you could discourage excessive illegitimate use by requiring custom license info in the API.
Change a function like:
jeastsy_lib::init()
to:
jeastsy_lib::init( "Licenced to Foobar Industries", "(hex string here)" );
Where the first parameter identifies the customer, and the second parameter is an MD5 or other hash of the first parameter with a salt.
When your library is purchased, you would supply both of those parameters to the customer.
To be clear, this is an an easily-averted protection for someone smart and ambitious enough. Consider this a speed bump on the path to piracy. This may convince potential customers that purchasing your software is the easiest path forward.
A C++ static library is a terribly bad redistributable.
It's a bot tangential, but IMO should be mentioned here. There are many compiler options that need to match the caller:
Ansi/Unicode,
static/dynamic CRT linking,
exception handling enabled/disabled,
representation of member function pointers
LTCG
Debug/Release
That's up to 64 configurations!
Also they are not portable across platforms even if your C++ code is platform independent - they might not even work with a future compiler version on the same platform! LTCG creates huge .lib files. So even if you can omit some of the choices, you have a huge build and distribution size, and a general PITA for the user.
That's the main reason I wouldn't consider buying anything that comes with static libraries only, much less somethign that adds copy protection of any sort.
Implementation ideas
I can't think of any better fundamental mechanism than Shmoopty's suggestion.
You can additionally "watermark" your builds, so that if you detect a library "in the wild", you can determine whom you sold that one to. (However, what are you going to do? Write angry e-mails to an potentially innocent customer?) Also, this requires some effort, using an easily locatable sequence of bytes not affecting execution won't help much.
You need to protect yourself agains LIB "unpacker" tools. However, the linker should still be able to remove unused functions.
General thoughts
Implementing a decent protection mechanism takes great care and some creativity, and I haven't yet seen a single one that does not create additional support cost and requires tough social decisions. Every hour spent on copy protection is an hour not spent improving your product. The market for C++ code isn't exactly huge, I see a lot of work that your customers have to pay for.
When I buy code, I happily pay for documentation, support, source code and other signs of "future proofness". Not so much for licencing.
Ideally, I would like to prevent the library being linked into an executable at all, if (and only if!) the library has been illegitimately copied onto the developer's machine. Is this possible?
How would you determine whether your library has been "illegitimately copied" at link time?
Remembering that none of your code is running when the linker does its work.
So, given that none of your code is running, we can't do anything at compile or link time. That leaves trying to determine whether the library was illegitimately copied onto the linking machine, from a completely unrelated target machine. And I'm still not seeing any way of making the two situations distinguishable, even if you were willing to impose burdens like "requires internet access" on the end-user.
My conclusion is that fuzzy lollipop's suggestion of "make something so useful that people want to buy it" is the best way to "copy-protect" your code library.
copy protection and in this case, execution protection by definition "places a burden on the user". no way to get around that. best form of copy protection is write something so useful people feel compelled to buy it.
You can't do what you want (perfect copy protection that places no burden on anyone except the people illegally copying the work).
There's no way for you to run code at link time with the standard linkers, so there's no way to determine at that point whether you're OK or not.
That leaves run-time, and that would mean requiring the end-users to validate somehow, which you've already determined is a non-starter.
Your only options are: ship it as-is and hope developers don't copy it too much, OR write your own linker and try to get people to use that (just in case it isn't obvious: That's not going to work. No developer in their right mind is going to buy a library that requires a special linker).
If you are planning to publish an expensive framework you might look into using FLEXlm.
I'm not associated with them but have seen it in various expensive frameworks often targeted Silicon Graphics hardware.
A couple ideas... (these have some major draw backs though which should be obvious)
For at compile time: put the library file on a share, and give it file permissions only for the developers you've sold it to.
For at run time: compile the library to work only on certain machines, eg. check the UIDs or MAC ids or something
I will soon be shipping a paid-for static library
The correct answer to your question is: don't bother with copy protection until you prove that you need it.
You say that you are "soon to be shipping a paid-for static library." Unless you have proven that you have people who are willing to steal your technology, implementing copy protection is irrelevant. An uneasy feeling that "there are people out there who will steal it" is not proof it will be stolen.
The hardest part of starting up a business is creating a product people will pay for. You have not yet proven that you have done that; ergo copy protection is irrelevant.
I'm not saying that your product has no value. I am saying that until you try to sell it, you will not know whether it has value or not.
And then, even if you do sell it, you will not know whether people steal it or not.
This is the difference between being a good programmer and being a good business owner.
First, prove that someone wants to steal your product. Then, if someone wants to steal it, add copy protection and keep improving your product.
I have only done this once. This was the method I used. It is far from foolproof, but I felt it was a good compromise. It is similar to the answer of Drew Dorman.
I would suggest providing an initialisation routine that requires the user to provide their email and a key linked to that email. Then have a way that anyone using the product can view the email information.
I used this method on a library that I use when writing plugins for AfterEffects. The initialisation routine builds the message shown in the "About" dialog for the plugin, and I made this message display the given email.
The advantages of this method in my eyes are:
A client is unlikely to pass on their email and key because they don't want their email associated with products they didn't write.
They could circumvent this by signing up with a burner email, but then they don't get their email associated with products they do write, so again this seems unlikely.
If a version with a burner email gets distributed then people might try it, then decide they want to use it, but need a version associated to their email so might buy a copy. Free advertising. You may even wish to do this yourself.
I also wanted to ensure that when I provide plugins to a company, they can't give my library to their internal programmers to write plugins themselves, based on my years of expertise. To do this I also linked the plugin name to the key. So a key will only work for a specific plugin name and developer email.
To expand on Drew's answer - to do this you take the users email when they sign up, you tag a secret set of characters on the end and then hash it. You give the user the hash. The secret set of characters is the same for all users and is known to your library, but the email makes the hash unique. When a user initialises the library with their email and the hash, your library appends the characters, hashes it and checks the result against the hash the user provided. This way you do not need a custom build for every user.
In the end I felt anything more complex than this would be futile as someone who really wanted to crack my library would probably be better at it than I would be at defending it. This method just stops a casual pirater from easily taking my library.
I had an idea I was mulling over with some colleagues. None of us knew whether or not it exists currently.
The Basic Premise is to have a system that has 100% uptime but can become more efficient dynamically.
Here is the scenario: * So we hash out a system quickly to a
specified set of interfaces, it has
zero optimizations, yet we are
confident that it is 100% stable
though (dubious, but for the sake of
this scenario please play
along) * We then profile
the original classes, and start to
program replacements for the
bottlenecks.
* The original and the replacement are initiated simultaneously and
synchronized.
* An original is allowed to run to completion: if a replacement hasnĀ“t
completed it is vetoed by the system
as a replacement for the
original.
* A replacement must always return the same value as the original, for a
specified number of times, and for a
specific range of values, before it is
adopted as a replacement for the
original.
* If exception occurs after a replacement is adopted, the system
automatically tries the same operation
with a class which was superseded by
it.
Have you seen a similar concept in practise? Critique Please ...
Below are comments written after the initial question in regards to
posts:
* The system demonstrates a Darwinian approach to system evolution.
* The original and replacement would run in parallel not in series.
* Race-conditions are an inherent issue to multi-threaded apps and I
acknowledge them.
I believe this idea to be an interesting theoretical debate, but not very practical for the following reasons:
To make sure the new version of the code works well, you need to have superb automatic tests, which is a goal that is very hard to achieve and one that many companies fail to develop. You can only go on with implementing the system after such automatic tests are in place.
The whole point of this system is performance tuning, that is - a specific version of the code is replaced by a version that supersedes it in performance. For most applications today, performance is of minor importance. Meaning, the overall performance of most applications is adequate - just think about it, you probably rarely find yourself complaining that "this application is excruciatingly slow", instead you usually find yourself complaining on the lack of specific feature, stability issues, UI issues etc. Even when you do complain about slowness, it's usually an overall slowness of your system and not just a specific applications (there are exceptions, of course).
For applications or modules where performance is a big issue, the way to improve them is usually to identify the bottlenecks, write a new version and test is independently of the system first, using some kind of benchmarking. Benchmarking the new version of the entire application might also be necessary of course, but in general I think this process would only take place a very small number of times (following the 20%-80% rule). Doing this process "manually" in these cases is probably easier and more cost-effective than the described system.
What happens when you add features, fix non-performance related bugs etc.? You don't get any benefit from the system.
Running the two versions in conjunction to compare their performance has far more problems than you might think - not only you might have race conditions, but if the input is not an appropriate benchmark, you might get the wrong result (e.g. if you get loads of small data packets and that is in 90% of the time the input is large data packets). Furthermore, it might just be impossible (for example, if the actual code changes the data, you can't run them in conjunction).
The only "environment" where this sounds useful and actually "a must" is a "genetic" system that generates new versions of the code by itself, but that's a whole different story and not really widely applicable...
A system that runs performance benchmarks while operating is going to be slower than one that doesn't. If the goal is to optimise speed, why wouldn't you benchmark independently and import the fastest routines once they are proven to be faster?
And your idea of starting routines simultaneously could introduce race conditions.
Also, if a goal is to ensure 100% uptime you would not want to introduce untested routines since they might generate uncatchable exceptions.
Perhaps your ideas have merit as a harness for benchmarking rather than an operational system?
Have I seen a similar concept in practice? No. But I'll propose an approach anyway.
It seems like most of your objectives would be meet by some sort of super source control system, which could be implemented with CruiseControl.
CruiseControl can run unit tests to ensure correctness of the new version.
You'd have to write a CruiseControl builder pluggin that would execute the new version of your system against a series of existing benchmarks to ensure that the new version is an improvement.
If the CruiseControl build loop passes, then the new version would be accepted. Such a process would take considerable effort to implement, but I think it feasible. The unit tests and benchmark builder would have to be pretty slick.
I think an Inversion of Control Container like OSGi or Spring could do most of what you are talking about. (dynamic loading by name)
You could build on top of their stuff. Then implement your code to
divide work units into discrete modules / classes (strategy pattern)
identify each module by unique name and associate a capability with it
when a module is requested it is requested by capability and at random one of the modules with that capability is used.
keep performance stats (get system tick before and after execution and store the result)
if an exception occurs mark that module as do not use and log the exception.
If the modules do their work by message passing you can store the message until the operation completes successfully and redo with another module if an exception occurs.
For design ideas for high availability systems, check out Erlang.
I don't think code will learn to be better, by itself. However, some runtime parameters can easily adjust onto optimal values, but that would be just regular programming, right?
About the on-the-fly change, I've shared the wondering and would be building it on top of Lua, or similar dynamic language. One could have parts that are loaded, and if they are replaced, reloaded into use. No rocket science in that, either. If the "old code" is still running, it's perfectly all right, since unlike with DLL's, the file is needed only when reading it in, not while executing code that came from there.
Usefulness? Naa...