Embedding Python into C++ application - c++

Context:
An ongoing problem we have been facing is unit testing our market data applications. These applications sit and observe data being retrieved from feeds and does something. Some critical events which are hard to trigger rarely occur and it is are difficult for the Testers to verify our applications perform correctly under all situations, hence we have to rely on unit tests.
These systems generally work by issuing callbacks (into our application) when an event has occurred, then our task to deal with this.
Solution I envision:
Is it possible to embed Python, or extend (not 100% clear on this), so that a tester could fire up a Python REPL and issue function calls that are akin to callbacks which are then handled by our C++ classes. Some form of dynamic manipulation of our objects at runtime.

I do something similar to this in one of my projects by using SWIG to generate python bindings for the relevant parts of the C++ code. Then I embed the interpreter as others have suggested. Having done that I can execute python code at will (e.g. PyRun_SimpleString), which can access C++ code. Normally I end up using something like a Singleton to make accessing specific C++ objects from python easier.
Also worth a mention is directors in swig python modules, which allow virtual functions to be handled much more intuitively. Depending on quite what you're doing you might find these very helpful.

What you want to do is possible, though not trivial to get right. It sounds like you want to embed (rather than extend) Python. Both topics are covered in the tutorial here.
There's quite a lot of work in mapping from C++ classes to Python classes, and there are a number of things that can go wrong in subtle ways, particularly with memory leaks and multithreading (if your existing code is multi-threaded). However, if it's only for use in a testing situation and stability is not mission-critical then it might be less of a problem.

Yes, it is possible. See this for the how.

Related

What features are standard for a testing framework?

So, I've been developing a few programs for AutoCAD 2005, and I've been constantly running into problems--specifically, I've been working on a program that needs to draw lines based on absolute angles ("azimuths") and distances, converting from a special input format to degrees to radians and back, and, like many other programmers, my code has gotten especially bulky and buggy; I've been stuck for almost a week and a half for a program/script that should've taken about three or four days.
I've been thinking about implementing a testing framework for making development much smoother, but unlike other languages, I'm working in a language that supposedly has absolutely no libraries for it and, even better, it's a embedding scripting environment.
I have a few ideas about how the design might work out, but I need to explain a few things:
Executing commands/AutoLISP background
Most of the programs I write are in the form of console-like commands, much like a shell. For example, say I write a function x. In AutoLISP, it's expressed as (the slash is also literally there): (defun x (arguments / local variables) body of function). To make it exposed to the console, I would need to change the name from x to C:x.
So, most of my testing must be done directly from the console; I tend to avoid the inbuilt Visual Lisp editor in AutoCAD because it seems to be disjointed from the actual working environment that the program will most likely be used in, and it doesn't seem to have an actual debugger. So, I frequently have to use (print "string") or other methods to debug my code.
Ideas/Thoughts/Questions
1. What types of functions might I want to expose for a testing framework? I've heard about multiple paradigms myself, such as compile-time testing, asserts, test classes in Java, etc. Should I try to code an assert? Perhaps I should create reflection-using tests?
2. How and where might I want to inject tests? I've thought about writing a different function and turning all local variables into globals to expose it to a possible testing context, but I'm still unsure of how I might want to do this. AutoLISP is lacking in regular Lisp macros, I believe, but I think it still has very nice reflection capabilities, so it is possible for me to actually feed in commands to the console in order to do things. I feel that an external, non-intrusive framework would make the most sense, but I'd like to get a more experienced answer on this.
Base functionality: with the same inputs, the system gives the same outputs.
Useful functionality: asserts. Given some setup in testing code, you then run part of the program to be tested, and make assertations about the output. If all of the assertations are as-expected, print something minimal. If an assertation fails, print something more verbose, to help track back what went wrong.
Incremental functionality. if something sneaks by your tests, and you have to manually find a bug, write a test that will cover that bug next time.
Continual functionality. Have the tests run at least once-per-submission to your source control system. They can run as a presubmit if failures are common but testing itself is quick, or can run as a postsubmit if failures are rare but testing is slow.

Removing dependencies from statechart framework

I've got lots of problems with project i am currently working on. The project is more than 10 years old and it was based on one of those commercial C++ frameworks which were very populary in the 90's. The problem is with statecharts. The framework provides quite common implementation of state pattern. Each state is a separate class, with action on entry, action in state etc. There is a switch which sets current state according to received events.
Devil is hidden in details. That project is enormous. It's something about 2000 KLOC. There is definitely too much statecharts (i've seen "for" loops implemented using statecharts). What's more ... framework allows to embed statechart in another statechart so there are many statecherts with seven or even more levels of nesting. Because statecharts run in different threads, and it's possible to send events between statecharts we have lots of synchronization problems (and big mess in interfaces).
I must admit that scale of this problem is overwhelming and I don't know how to touch it. My first idea was to remove as much code as I can from statecharts and put it into separate classes. Then delegate these classes from statechart to do a job. But in result we will have many separate functions, which logically don't have any specific functionality and any change in statechart architecture will need also a change of that classes and functions.
So I asking for help:
Do you know any books/articles/magic artefacts which can help me to fix this ? I would like to at least separate as much code as I can from statechart without introducing any hidden dependencies and keep separated code maintainable, testable and reusable.
If you have any suggestion how to handle this, please let me know.
The statechart pattern is intended to be used specifically to remove switch statements, so this sounds like a horrid abuse. Additionally, states should only change on asynchronous events. If you are processing an event and you change through multiple states (or for loop, etc.), then this is also a horrid abuse of the pattern.
I would start from these two points, as they will solve much of your concurrency issues just fixing them up. What you need to determine is:
What are your external, asynchronous events to the system? These are the only things that should be determining state transitions, not things that happen during event processing. An event may cause 0 or 1 state transitions. Once you have a list of these state transitions, you can reconstruct the actual states of your system. If you are aware of UML State diagrams, this would be a perfect time to sketch one up in a charting program, not just for yourself (though it will help you immensely), but also for everyone in the future that has to return to the project. As you have learned, this happens.
Now that you know what are really states, list what are states in the code that shouldn't be. This usually indicates that something can be "functionally decomposed". Instead of a state object for each of these, likely all that is needed is a separate function. This will cut down on a lot of the overhead of state objects and should clean up the code immensely.
Now it's time to tackle those horrendous switch statements you mentioned. If they were truly based on state, you shouldn't need one at all. Instead, you should be able to call the state machine directly.
Something like:
myStateMachine->myEvent();
and it should work without any switch. But notice, this may be the case even for some of those objects that don't work across asynchronous events. This is also an indication of where you may just use inheritance to get the same effect. If you have:
switch (someTypeIdentifier)
{
case type1:
doSomething();
break;
case type2:
doSomethingElse();
break;
}
usually the correct OOP method to do is to create two actual types Type1, Type2, both derived from an abstract base TypeBase, with a virtual method doSomething() that does what you need. The reason this is useful is because it means you can "close" the handling (in the meaning of the Open/Closed Principle), and still extend the functionality by adding new derived types as needed (leaving it open to extension). This saves bugs like crazy because it gets developers hands out of those switch statements, which can get quite ugly and convoluted, instead encapsulating each separate behavior in separate classes.
4 - Now look to fix up your thread issues. Identify all objects used from multiple threads. Make a list. Now, how are these used? Are some of them always used together? Start making groups. The goal here is to find the level of encapsulation that best works for these objects, separate the objects into individual classes that control their own synchronisation, figure out the atomic level of actual "transactions" for the objects, and make methods of the classes that expose those meaningful transactions, wrapped behind the scenes with the appropriate mutexes, condition variables, etc.
You might be saying "that sounds like a lot of work! Why do all that instead of just writing it all over myself?" Good question! :) The reason is actually straightforward: if you are going to do it all by yourself, those are the steps you should be doing anyway. You should be identifying your states, your dynamic polymorphism, and getting a handle on the multithreaded transactions. But, if you start with the existing code, you also have all of those unspoken business rules that were never documented and may cause all sorts of unexpected bugs down the line. You don't have to bring everything over - if you suspect it's a bug, discuss the logic with the people who have worked with the system in the past (if available), QA, or whoever might identify bugs, and see if it really should be carried over. But you need to actually evaluate what the bugs are either way, or you may not code something that actually needed coding.
In the end, this is a manual process that is a part of software engineering. There are CASE tools that can help draw up the state diagrams and even publish them to code, there are refactoring tools, like those found in many IDEs, that can help move code between functions and classes, and similar tools which can help identify threading needs. However, those things shouldn't be picked up for a single project. They need to be learned throughout your career, picking them up and learning them more deeply over years of work, as they are a part of being a software engineer. They don't do it for you. You still need to know the whys and hows, and they just help get it done more efficiently.
Statecharts (including nested Statecharts) are a powerful way to specify, understand and even simulate/validate complex control flow. But to gain the benefit, you need the statechart model in a suitable tool (I used Statemate way back in the day, not sure if it's still available), plus a reliable mapping from the chart to the code (Statemate used to generate the code) - then you can forget about the state management code (mostly)! In your situation, if you don't have the model, I would try to reverse one from the code - as Ira says, chances are high that the original developers had a model in some form, and you may find the code making a lot of sense as the model emerges. If this works out, you will have a really good spec/model of the code which should make future code edits much easier (even if you don't want to go to automatic code generation, and maintain the code/model mapping manually (but you'll need to be meticulous!!))
Sounds to me like your best bet is (gulp!) likely to start from scratch if it's as horrifically broken as you make out. Is there any documentation? Could you begin to build some saner software based on the docs?
If a complete re-write isn't an option (and they never are in my experience) I'd try some of the following:
If you don't already have it, draw an architectural picture of the whole system. Sketch out how all the bits are supposed to work together and that will help you break the system down into potentially manageable / testable parts.
Do you have any kind of requirements or testing plan in place? If not, can you write one and start to put unit tests in place for the various chunks of code / functionality which exist already? If you can do that, you can start to refactor things without breaking as much of whatever does currently work.
Once you've broken things down a bit, start building your unit tests into integration tests which pull together more of the functionality.
I've not read them myself, but I've heard good things about these books which may have some advice you can use:
Refactoring: Improving the Design of Existing Code (Object Technology Series).
Working Effectively with Legacy Code (Robert C. Martin)
Good luck! :-)

Embedded Lua - timing out rogue scripts (e.g. infinite loop) - an example anyone?

I have embedded Lua in a C++ application. I need to be able to kill rogue (i.e. badly written scripts) from hogging resources.
I know I will not be able to cater for EVERY type of condition that causes a script to run indefinitely, so for now, I am only looking at the straightforward Lua side (i.e. scripting side problems).
I also know that this question has been asked (in various guises) here on SO. Probably the reason why it is constantly being re-asked is that as yet, no one has provided a few lines of code to show how the timeout (for the simple cases like the one I described above), may actually be implemented in working code - rather than talking in generalities, about how it may be implemented.
If anyone has actually implemented this type of functionality in a C++ with embedded Lua application, I (as well as many other people - I'm sure), will be very grateful for a little snippet that shows:
How a timeout can be set (in the C++ side) before running a Lua script
How to raise the timeout event/error (C++ /Lua?)
How to handle the error event/exception (C++ side)
Such a snippet (even pseudocode) would be VERY, VERY useful indeed
You need to address this with a combination of techniques. First, you need to establish a suitable sandbox for the untrusted scripts, with an environment that provides only those global variables and functions that are safe and needed. Second, you need to provide for limitations on memory and CPU usage. Third, you need to explicitly refuse to load pre-compiled bytecode from untrusted sources.
The first point is straightforward to address. There is a fair amount of discussion of sandboxing Lua available at the Lua users wiki, on the mailing list, and here at SO. You are almost certainly already doing this part if you are aware that some scripts are more trusted than others.
The second point is question you are asking. I'll come back to that in a moment.
The third point has been discussed at the mailing list, but may not have been made very clearly in other media. It has turned out that there are a number of vulnerabilities in the Lua core that are difficult or impossible to address, but which depend on "incorrect" bytecode to exercise. That is, they cannot be exercised from Lua source code, only from pre-compiled and carefully patched byte code. It is straightforward to write a loader that refuses to load any binary bytecode at all.
With those points out of the way, that leaves the question of a denial of service attack either through CPU consumption, memory consumption, or both. First, the bad news. There are no perfect techniques to prevent this. That said, one of the most reliable approaches is to push the Lua interpreter into a separate process and use your platform's security and quota features to limit the capabilities of that process. In the worst case, the run-away process can be killed, with no harm done to the main application. That technique is used by recent versions of Firefox to contain the side-effects of bugs in plugins, so it isn't necessarily as crazy an idea as it sounds.
One interesting complete example is the Lua Live Demo. This is a web page where you can enter Lua sample code, execute it on the server, and see the results. Since the scripts can be entered anonymously from anywhere, they are clearly untrusted. This web application appears to be as secure as can be asked for. Its source kit is available for download from one of the authors of Lua.
Snippet is not a proper use of terminology for what an implementation of this functionality would entail, and that is why you have not seen one. You could use debug hooks to provide callbacks during execution of Lua code. However, interrupting that process after a timeout is non-trivial and dependent upon your specific architecture.
You could consider using a longjmp to a jump buffer set just prior to the lua_call or lua_pcall after catching a time out in a luaHook. Then close that Lua context and handle the exception. The timeout could be implemented numerous ways and you likely already have something in mind that is used elsewhere in your project.
The best way to accomplish this task is to run the interpreter in a separate process. Then use the provided operating system facilities to control the child process. Please refer to RBerteig's excellent answer for more information on that approach.
A very naive and simple, but all-lua, method of doing it, is
-- Limit may be in the millions range depending on your needs
setfenv(code,sandbox)
pcall (function() debug.sethook(
function() error ("Timeout!") end,"", limit)
code()
debug.sethook()
end);
I expect you can achieve the same through the C API.
However, there's a good number of problems with this method. Set the limit too low, and it can't do its job. Too high, and it's not really effective. (Can the chunk get run repeatedly?) Allow the code to call a function that blocks for a significant amount of time, and the above is meaningless. Allow it to do any kind of pcall, and it can trap the error on its own. And whatever other problems I haven't thought of yet. Here I'm also plain ignoring the warnings against using the debug library for anything (besides debugging).
Thus, if you want it reliable, you should probably go with RB's solution.
I expect it will work quite well against accidental infinite loops, the kind that beginning lua programmers are so fond of :P
For memory overuse, you could do the same with a function checking for increases in collectgarbage("count") at far smaller intervals; you'd have to merge them to get both.

C++ Implementing scripting (Repeatable Actions)

I have a desktop app written in C++. It does a variety of different things and interacts with a database. I have made most actions that need to be performed selectable from a list. Actions are performed serially on data sets and need to be saved and played back at a later time on a different result set.
The actions are typically performed from a drop down menus and there are no load/save to disk operations. I dont' necessarily need scripting capability, but if thats the easiest way to go that's fine.
How would you approach this?
Added, This application is NOT written in OOP style
Lua
It's a great, simple scripting language, easily embedded into your apps.
Sounds like the GoF Command design pattern:
http://en.wikipedia.org/wiki/Command_pattern
http://www.oodesign.com/command-pattern.html
lua with LuaBind is quite widely used
Python with boost::python is also widely used but heavier on space
Now I know less know scripting languages that are useful in game programming, so useful in whatever application you need scripting for:
Falcon : the more versatile and flexible solution I know and use
ChaiScript : the easier to integrate I know
AngelScript : didn't try yet
GameMonkey : didn't try yet
IO : didn't try yet

How do you glue Lua to C++ code?

Do you use Luabind, toLua++, or some other library (if so, which one) or none at all?
For each approach, what are the pro's and con's?
I can't really agree with the 'roll your own' vote, binding basic types and static C functions to Lua is trivial, yes, but the picture changes the moment you start dealing with tables and metatables; things go trickier very quickly.
LuaBind seems to do the job, but I have a philosophical issue with it. For me it seems like if your types are already complicated the fact that Luabind is heavily templated is not going to make your code any easier to follow, as a friend of mine said "you'll need Herb Shutter to figure out the compilation messages". Plus it depends on Boost, plus compilation times get a serious hit, etc.
After trying a few bindings, Tolua++ seems the best. Tolua doesn't seem to be very much in development, where as Tolua++ seems to work fine (plus half the 'Tolua' tutorials out there are, in fact, 'Tolua++' tutorials, trust me on that:) Tolua does generate the right stuff, the source can be modified and it seems to deal with complicated cases (like templates, unions, nameless structs, etc, etc)
The biggest issue with Tolua++ seems to be the lack of proper tutorials, pre-set Visual Studio projects, or the fact that the command line is a bit tricky to follow (you path/files can't have white spaces -in Windows at least- and so on) Still, for me it is the winner.
To answer my own question in part:
Luabind: once you know how to bind methods and classes via this awkward template syntax, it's pretty straightforward and easy to add new bindings. However, luabind has a significant performance impact and shouldn't be used for realtime applications. About 5-20 times more overhead than calling C functions that manipulate the stack directly.
I don't use any library. I have used SWIG to expose a C library some time ago, but there was too much overhead, and I stop using it.
The pros are better performance and more control, but its takes more time to write.
Use raw Lua API for your bindings -- and keep them simple. Take inspiration in the API itself (AUX library) and libraries by Lua authors.
With some practice raw API is the best option -- maximum flexibility and minimum of unneeded overhead. You've got what you want and no more, the way you need it to be.
If you must bind large third-party libraries use automated generators like tolua, tolua++ (or even roll your own for the specific case). It would free you from manual work.
I would not recommend using Luabind. At the moment it's development stalled (however starting to come back to life), and if you would meet some corner case, you may be on your own. Also Luabind heavily uses template metaprogramming. This may (and may not) be unacceptable, depending on the point of view.