Self Testing Systems - unit-testing

I had an idea I was mulling over with some colleagues. None of us knew whether or not it exists currently.
The Basic Premise is to have a system that has 100% uptime but can become more efficient dynamically.
Here is the scenario: * So we hash out a system quickly to a
specified set of interfaces, it has
zero optimizations, yet we are
confident that it is 100% stable
though (dubious, but for the sake of
this scenario please play
along) * We then profile
the original classes, and start to
program replacements for the
bottlenecks.
* The original and the replacement are initiated simultaneously and
synchronized.
* An original is allowed to run to completion: if a replacement hasnĀ“t
completed it is vetoed by the system
as a replacement for the
original.
* A replacement must always return the same value as the original, for a
specified number of times, and for a
specific range of values, before it is
adopted as a replacement for the
original.
* If exception occurs after a replacement is adopted, the system
automatically tries the same operation
with a class which was superseded by
it.
Have you seen a similar concept in practise? Critique Please ...
Below are comments written after the initial question in regards to
posts:
* The system demonstrates a Darwinian approach to system evolution.
* The original and replacement would run in parallel not in series.
* Race-conditions are an inherent issue to multi-threaded apps and I
acknowledge them.

I believe this idea to be an interesting theoretical debate, but not very practical for the following reasons:
To make sure the new version of the code works well, you need to have superb automatic tests, which is a goal that is very hard to achieve and one that many companies fail to develop. You can only go on with implementing the system after such automatic tests are in place.
The whole point of this system is performance tuning, that is - a specific version of the code is replaced by a version that supersedes it in performance. For most applications today, performance is of minor importance. Meaning, the overall performance of most applications is adequate - just think about it, you probably rarely find yourself complaining that "this application is excruciatingly slow", instead you usually find yourself complaining on the lack of specific feature, stability issues, UI issues etc. Even when you do complain about slowness, it's usually an overall slowness of your system and not just a specific applications (there are exceptions, of course).
For applications or modules where performance is a big issue, the way to improve them is usually to identify the bottlenecks, write a new version and test is independently of the system first, using some kind of benchmarking. Benchmarking the new version of the entire application might also be necessary of course, but in general I think this process would only take place a very small number of times (following the 20%-80% rule). Doing this process "manually" in these cases is probably easier and more cost-effective than the described system.
What happens when you add features, fix non-performance related bugs etc.? You don't get any benefit from the system.
Running the two versions in conjunction to compare their performance has far more problems than you might think - not only you might have race conditions, but if the input is not an appropriate benchmark, you might get the wrong result (e.g. if you get loads of small data packets and that is in 90% of the time the input is large data packets). Furthermore, it might just be impossible (for example, if the actual code changes the data, you can't run them in conjunction).
The only "environment" where this sounds useful and actually "a must" is a "genetic" system that generates new versions of the code by itself, but that's a whole different story and not really widely applicable...

A system that runs performance benchmarks while operating is going to be slower than one that doesn't. If the goal is to optimise speed, why wouldn't you benchmark independently and import the fastest routines once they are proven to be faster?
And your idea of starting routines simultaneously could introduce race conditions.
Also, if a goal is to ensure 100% uptime you would not want to introduce untested routines since they might generate uncatchable exceptions.
Perhaps your ideas have merit as a harness for benchmarking rather than an operational system?

Have I seen a similar concept in practice? No. But I'll propose an approach anyway.
It seems like most of your objectives would be meet by some sort of super source control system, which could be implemented with CruiseControl.
CruiseControl can run unit tests to ensure correctness of the new version.
You'd have to write a CruiseControl builder pluggin that would execute the new version of your system against a series of existing benchmarks to ensure that the new version is an improvement.
If the CruiseControl build loop passes, then the new version would be accepted. Such a process would take considerable effort to implement, but I think it feasible. The unit tests and benchmark builder would have to be pretty slick.

I think an Inversion of Control Container like OSGi or Spring could do most of what you are talking about. (dynamic loading by name)
You could build on top of their stuff. Then implement your code to
divide work units into discrete modules / classes (strategy pattern)
identify each module by unique name and associate a capability with it
when a module is requested it is requested by capability and at random one of the modules with that capability is used.
keep performance stats (get system tick before and after execution and store the result)
if an exception occurs mark that module as do not use and log the exception.
If the modules do their work by message passing you can store the message until the operation completes successfully and redo with another module if an exception occurs.

For design ideas for high availability systems, check out Erlang.

I don't think code will learn to be better, by itself. However, some runtime parameters can easily adjust onto optimal values, but that would be just regular programming, right?
About the on-the-fly change, I've shared the wondering and would be building it on top of Lua, or similar dynamic language. One could have parts that are loaded, and if they are replaced, reloaded into use. No rocket science in that, either. If the "old code" is still running, it's perfectly all right, since unlike with DLL's, the file is needed only when reading it in, not while executing code that came from there.
Usefulness? Naa...

Related

Embedded Lua - timing out rogue scripts (e.g. infinite loop) - an example anyone?

I have embedded Lua in a C++ application. I need to be able to kill rogue (i.e. badly written scripts) from hogging resources.
I know I will not be able to cater for EVERY type of condition that causes a script to run indefinitely, so for now, I am only looking at the straightforward Lua side (i.e. scripting side problems).
I also know that this question has been asked (in various guises) here on SO. Probably the reason why it is constantly being re-asked is that as yet, no one has provided a few lines of code to show how the timeout (for the simple cases like the one I described above), may actually be implemented in working code - rather than talking in generalities, about how it may be implemented.
If anyone has actually implemented this type of functionality in a C++ with embedded Lua application, I (as well as many other people - I'm sure), will be very grateful for a little snippet that shows:
How a timeout can be set (in the C++ side) before running a Lua script
How to raise the timeout event/error (C++ /Lua?)
How to handle the error event/exception (C++ side)
Such a snippet (even pseudocode) would be VERY, VERY useful indeed
You need to address this with a combination of techniques. First, you need to establish a suitable sandbox for the untrusted scripts, with an environment that provides only those global variables and functions that are safe and needed. Second, you need to provide for limitations on memory and CPU usage. Third, you need to explicitly refuse to load pre-compiled bytecode from untrusted sources.
The first point is straightforward to address. There is a fair amount of discussion of sandboxing Lua available at the Lua users wiki, on the mailing list, and here at SO. You are almost certainly already doing this part if you are aware that some scripts are more trusted than others.
The second point is question you are asking. I'll come back to that in a moment.
The third point has been discussed at the mailing list, but may not have been made very clearly in other media. It has turned out that there are a number of vulnerabilities in the Lua core that are difficult or impossible to address, but which depend on "incorrect" bytecode to exercise. That is, they cannot be exercised from Lua source code, only from pre-compiled and carefully patched byte code. It is straightforward to write a loader that refuses to load any binary bytecode at all.
With those points out of the way, that leaves the question of a denial of service attack either through CPU consumption, memory consumption, or both. First, the bad news. There are no perfect techniques to prevent this. That said, one of the most reliable approaches is to push the Lua interpreter into a separate process and use your platform's security and quota features to limit the capabilities of that process. In the worst case, the run-away process can be killed, with no harm done to the main application. That technique is used by recent versions of Firefox to contain the side-effects of bugs in plugins, so it isn't necessarily as crazy an idea as it sounds.
One interesting complete example is the Lua Live Demo. This is a web page where you can enter Lua sample code, execute it on the server, and see the results. Since the scripts can be entered anonymously from anywhere, they are clearly untrusted. This web application appears to be as secure as can be asked for. Its source kit is available for download from one of the authors of Lua.
Snippet is not a proper use of terminology for what an implementation of this functionality would entail, and that is why you have not seen one. You could use debug hooks to provide callbacks during execution of Lua code. However, interrupting that process after a timeout is non-trivial and dependent upon your specific architecture.
You could consider using a longjmp to a jump buffer set just prior to the lua_call or lua_pcall after catching a time out in a luaHook. Then close that Lua context and handle the exception. The timeout could be implemented numerous ways and you likely already have something in mind that is used elsewhere in your project.
The best way to accomplish this task is to run the interpreter in a separate process. Then use the provided operating system facilities to control the child process. Please refer to RBerteig's excellent answer for more information on that approach.
A very naive and simple, but all-lua, method of doing it, is
-- Limit may be in the millions range depending on your needs
setfenv(code,sandbox)
pcall (function() debug.sethook(
function() error ("Timeout!") end,"", limit)
code()
debug.sethook()
end);
I expect you can achieve the same through the C API.
However, there's a good number of problems with this method. Set the limit too low, and it can't do its job. Too high, and it's not really effective. (Can the chunk get run repeatedly?) Allow the code to call a function that blocks for a significant amount of time, and the above is meaningless. Allow it to do any kind of pcall, and it can trap the error on its own. And whatever other problems I haven't thought of yet. Here I'm also plain ignoring the warnings against using the debug library for anything (besides debugging).
Thus, if you want it reliable, you should probably go with RB's solution.
I expect it will work quite well against accidental infinite loops, the kind that beginning lua programmers are so fond of :P
For memory overuse, you could do the same with a function checking for increases in collectgarbage("count") at far smaller intervals; you'd have to merge them to get both.

Tool for model checking large, distributed C++ projects such as KDE?

Is there a tool which can handle model checking large, real-world, mostly-C++, distributed systems, such as KDE?
(KDE is a distributed system in the sense that it uses IPC, although typically all of the processes are on the same machine. Yes, by the way, this is a valid usage of "distributed system" - check Wikipedia.)
The tool would need to be able to deal with intraprocess events and inter-process messages.
(Let's assume that if the tool supports C++, but doesn't support other stuff that KDE uses such as moc, we can hack something together to workaround that.)
I will happily accept less general (e.g. static analysers specialised for finding specific classes of bugs) or more general static analysis alternatives, in lieu of actual model checkers. But I am only interested in tools that can actually handle projects of the size and complexity of KDE.
You're obviously looking for a static analysis tool that can
parse C++ on scale
locate code fragments of interest
extract a model
pass that model to a model checker
report that result to you
A significant problem is that everybody has a different idea about what model they'd like to check.
That alone likely kills your chance of finding exactly what you want, because each model extraction tool
has generally made a choice as to what it wants to capture as a model, and the chances that it matches
what you want precisely are IMHO close to zero.
You aren't clear on what specifically you want to model, but I presume you want to find the communication
primitives and model the process interactions to check for something like deadlock?
The commercial static analysis tool vendors seem like a logical place to look, but I don't think they are there, yet. Coverity would seem like the best bet, but it appears they only have some kind of dynamic analysis for Java threading issues.
This paper claims to do this, but I have not looked at in any detail: Compositional analysis of C/C++ programs
with VeriSoft. Related is [PDF] Computer-Assisted Assume/Guarantee Reasoning with VeriSoft. It appears you have to hand-annotate
the source code to indicate the modelling elements of interest. The Verifysoft tool itself appears to be proprietary to Bell Labs and is likely hard to obtain.
Similarly this one: Distributed Verification of Multi-threaded C++ Programs .
This paper also makes interesting claims, but doesn't process C++ in spite of the title:
Runtime Model Checking of Multithreaded C/C++ Programs.
While all the parts of this are difficult, an issue they all share is parsing C++ (as exemplified by
the previously quoted paper) and finding the code patterns that provide the raw information for the model.
You also need to parse the specific dialect of C++ you are using; its not nice that the C++ compilers all accept different languages. And, as you have observed, processing large C++ codes is necessary. Model checkers (SPIN and friends) are relatively easy to find.
Our DMS Software Reengineering Toolkit provides for general purpose parsing, with customizable pattern matching and fact extraction, and has a robust C++ Front End that handles many dialects of C++ (EDIT Feb 2019: including C++17 in Ansi, GCC and MS flavors). It could likely be configured to find and extract the facts that correspond to the model you care about. But it doesn't do this this off the shelf.
DMS with its C front end have been used to process extremely large C applications (19,000 compilation units!). The C++ front end has been used in anger on a variety of large-scale C++ projects (EDIT Feb 2019: including large scale refactoring of APIs across 3000+ compilation units). Given DMS's general capability, I think it likely capable of handling fairly big chunks of code. YMMV.
Static code analyzers when used against large code base first time usually produce so many warnings and alerts that you won't be able to analyze all of them in reasonable amount of time. It is hard to single out real problems from code that just look suspicious to a tool.
You can try to use automatic invariant discovery tools like "Daikon" that capture perceived invariants at run time. You can validate later if discovered invariants (equivalence of variables "a == b+1" for example) make sense and then insert permanent asserts into your code. This way when invariant is violated as result of your change you will get a warning that perhaps you broke something by your change. This method helps to avoid restructuring or changing your code to add tests and mocks.
The usual way of applying formal techniques to large systems is to modularise them and write specifications for the interfaces of each module. Then you can verify each module independently (while verifying a module, you import the specifications - but not the code - of the other modules it calls). This approach makes verification scalable.

Morphing multiple executables into a single application

I have three legacy applications that share a lot of source code and data. Multiple instances of each of these applications be executed by a user at any time, e.g. a dozen mixed application executions can be active at a time. These applications currently communicate through shared memory and messaging techniques so that they can maintain common cursor positioning, etc. The applications are written primarily in C++, use Qt and run in total to about 5 million lines of code. Only some of the existing code is threadsafe.
I want to consolidate these three executables into a single executable and use multi-threading functionality to allow multiple instance of each of the three functionality branches to execute at the same time. It has been suggested that I look into some of the features provided by Boost, e.g. shared pointers, and use OpenMP to orchestrate overall execution of the multiple threads.
Any comments on how to proceed will be appreciated, particularly references on the best way to tackle this kind of a refactoring problem.
My suggestion to you would be design the desired solution first (firstly assuming that the requirements are the same) and then build a phased migration path from the existing code base basing it on the requirements that third party functionality may introduce.
Refactoring should be one small step at a time - but knowing where you are going.
The Working Effectively with Legacy Code (Robert C. Martin Series) will be a good read I'd suggest.
Trust me (I've got the t shirts) don't attempt to refactor unless you know how to prove functionality - automated verification tests will be your saviour.
I don't think anyone will be able to give you an informed answer. There are too many things to take into considerations. Only after careful analysis of the processes involved and the code behind would anyone (you included) be able to provide you with a project path.
That said, there are however a few things you should take into consideration.
Your applications and the way they seem to communicate with each other seems to me an already sound solution. This is so, especially because you have 5 million lines of code to tackle with, if you choose to get into refactoring your current system. This is a strong deterrent. If there is a need to provide threading support in order to optimize current application messaging, I'd perhaps suggest you consider the possibility of introducing MT into the shared memory and messaging routines, instead of merging applications.
The act of merging your applications could be seen at first glance as the "simple" act of gluing your current code and removing the current routines responsible for memory sharing and messaging, replacing them with normal function calls to the glued objects. My gut tells me it will not be that simple. You have to consider that almost certainly your current backbone code was designed for the current messaging solution. Your current abstractions were thought of with this in mind. You will almost certainly find that you will also need to make significant changes to existing backbone code. And again here you will find that wall of 5 million lines a huge problem. Because changes will quickly propagate across your entire system and leave you with 5 million lines of code to handle and god knows how many new bugs.
I'd suggest 1 above. But it may be that you do need for some reason to consolidate your applications. That being so, I'd still suggest something else before you try 2. Create an interface application responsible for firing up, displaying and maintaining the current 3 applications. Your users will know no better if you do this right. You'll still be able to apply 1 to your current system and join your applications under a common interface, despite them being in fact separate entities.
"...about 5 million lines of code..." Hmm... It is hard to say for sure without knowing your system, but since it is a "legacy" system, you can probably gain a lot by removing code duplication. Check out Simian and CPD.
5 million lines is a lot ot code. Of course, you may need it, but my guess is that you don't.
You definitely need a good reason to change a change a code-base like that to be multi-threaded, especially in C++. Doing it without cleaning it up completely first is a recipe for disaster.

Are some logical paths inherently untestable?

I have been using TDD to drive the project that I am currently working on and the results have been fairly satisfying. I did run into a problem (described here; still without a solution or any suggestions!) where there are some aspects of a particular method which may not be able to be tested (as in my example; briefly, I want to be able to handle a ManagementException which has a specific ErrorCode - but it doesn't seem possible for me to set up a test which throws a ManagementException like that).
So, how does one deal with that? Do we simply accept the fact that some logical paths are untestable (because of the framework that we are working in or limitations in the testing framework(s) that are currently available)?
Some designs do not lend themselves to testability.. especially ones that do not have testability as one of the design goals. Generally TDDed designs do not fall into this category.
To answer your original question, I've posted a response which involves using reflection to slot in the requested error code. However this may not work in all situations and is not a general solution.
The tradeoff here is the effort in writing the test vs the benefit of having that particular piece of code under automated tests. If you feel that the cost to benefit ratio is huge and probability of failure is miniscule, you may write it up as an exceptional manual test, a comment to future developers and verify it manually for now. I'd say be pragmatic, if you've spent 30-40 mins of a couple of developers' brain time trying to get it under test, maybe you need to step back and rethink your strategy. Have a look at Michael Feather's 'Working effectively with legacy code' on some suggestions to overcome barriers to testability.
I don't think you could say that anything is logically untestable, but you will certainly find areas of code where the effort required to test them would be better spent elsewhere.
This is a great question, and one which I also found myself contemplating recently.
So first, I wouldn't say some logical paths are "untestable" - at most they are probably very hard to test with automatic unit testing. You could probably still test most of these problematic paths with some serious heavy duty system tests.
Consider this - anything you test can be thought to run inside a virtual machine under your control and you can (theoretically) simulate every aspect of its operation in order to test your software. Whether or not this is practical for most applications is another question.
I've just tried answering your original question (and collided in midflight with somebody else saying the same thing more concisely, or most of it at least;-). Anyway, there surely exist frameworks that are way too rigid (thanks to private and friends), and if you can't use introspection to go around that (despite having done all proper incantations), then you're just using a language that's too rigid as well as a framework that is.
I'd be astonished if that was the case in an overall system that supports dynamic languages (as .NET now does) such as IronRuby and IronPython -- maybe if C# won't let you go around accessibility limitations via introspection, the dynamic languages could serve.
That said, it is surely possible for the overall environment to be designed so badly and so rigidly to make it impossible to unit-test certain things -- even though I'm not entirely convinced that this is the case in your current situation.
Some things cannot be tested in an automated unit test because the language/framework/situation is just not open to it. The way to handle that is to reduce that area as much as possible and keep it so simple that it is highly unlikely to be a source of bugs or behavior changes later on.
There is also more to testing than just unit testing, and those areas (such as Acceptance testing, QA, etc.) are not covered by unit testing as well.

Simplifying algorithm testing for researchers.

I work in a group that does a large mix of research development and full shipping code.
Half the time I develop processes that run on our real time system ( somewhere between soft real-time & hard real-time, medium real-time? )
The other half I write or optimize processes for our researchers who don't necessarily care about the code at all.
Currently I'm working on a process which I have to fork into two different branches.
There is a research version for one group, and a production version that will need to occasionally be merged with the research code to get the latest and greatest into production.
To test these processes you need to setup a semi complicated testing environment that will send the data we analyze to the process at the correct time (real time system).
I was thinking about how I could make the:
Idea
Implement
Test
GOTO #1
Cycle as easy, fast and pain free as possible for my colleagues.
One Idea I had was to embed a scripting language inside these long running processes.
So as the process run's they could tweak the actual algorithm & it's parameters.
Off the bat I looked at embedding:
Lua (useful guide)
Python (useful guide)
These both seem doable and might actually fully solve the given problem.
Any other bright idea's out there?
Recompiling after a 1-2 line change, redeploying to the test environment and restarting just sucks.
The system is fairly complicated and hopefully I explained it half decently.
If you can change enough of the program through a script to be useful, without a full recompile, maybe you should think about breaking the system up into smaller parts. You could have a "server" that handles data loading etc and then the client code that does the actual processing. Each time the system loads new data, it could check and see if the client code has been re-compiled and then use it if that's the case.
I think there would be a couple of advantages here, the largest of which would be that the whole system would be much less complex. Now you're working in one language instead of two. There is less of a chance that people can mess things up when moving from python or lua mode to c++ mode in their heads. By embedding some other language in the system you also run the risk of becoming dependent on it. If you use python or lua to tweek the program, those languages either become a dependency when it becomes time to deploy, or you need to back things out to C++. If you choose to port things to C++ theres another chance for bugs to crop up during the switch.
Embedding Lua is much easier than embedding Python.
Lua was designed from the start to be embedded; Python's embeddability was grafted on after the fact.
Lua is about 20x smaller and simpler than Python.
You don't say much about your build process, but building and testing can be simplified significantly by using a really powerful version of make. I use Andrew Hume's mk, but you would probably be even better off investing the time to master Glenn Fowler's nmake, which can add dependencies on the fly and eliminate the need for a separate configuration step. I don't ordinarily recommend nmake because it is somewhat complicated, but it is very clear that Fowler and his group have built into nmake solutions for lots of scaling and portability problems. For your particular situation, it may be worth the effort to master it.
Not sure I understand your system, but if the build and deployment is too complicated, maybe you could automate it? If deployment is completely automatic, would that solve the problem?
I don't understand how a scripting language would solve the problem? If you change your algorithm, you still need to restart calculation from the beginning, don't you?
It kind of sounds like what you need is CruiseControl or something similar; every time hyou touch the baseline code, it rebuilds and reruns tests.