Using ptrace to write a program supervisor in userspace - c++

I'll looking for advice/resources to write a program that can intercept system calls from a programm to supervise it's filesystem, network, etc access.
The aim of this is to write an online judge, so that untrusted code can be run safely on a server.
This is on linux, and I would prefer to write C++ or a scripting langauge (ruby, python, etc), and a library would be great!
Thanks.

This looks like a good place to start.
http://www.linuxjournal.com/article/6100

You can't safely use ptrace() to sandbox a hostile application.
The application can always use multiple threads with deliberate race conditions to alter syscall arguments passed via pointers (eg. a filename) after you've inspected them but before the kernel looks at them.

Related

Process Isolation in Rust

I want to implement a server for a protocol. For security reasons the parser should be isolated in its own thread from the rest of the program and only a bidirectional channel should be held open for communication.
The parser thread should lose any possibility to modify the other thread's memory and lose its power to do syscalls (using seccomp).
Is there an easy way to achieve this behavior for the parser thread in Rust?
If you're concerned about issues beyond what Rust's strong safety and type system can protect against (e.g. bugs in those, or in third-party libraries etc.) then you really want separate processes rather than just threads; even if you use seccomp on an untrusted thread, at the OS/CPU level it still has full write access to other threads' memory in the same process.
Either way you'll need to write code designed to run in seccomp carefully (for example allocating extra heap memory might not work) - but the good news is that Rust is a great language for having that control!
There's a reasonably useful discussion on seccomp in Rust which has some suggestions.
The best bet looks like gaol from the Servo project, which is a more general process sandbox (including seccomp). There are also some other lower level seccomp wrappers like this one.
I haven't tried any of this yet, so I'd be interested to hear any other viewpoints/experience.

How to run a C++ program inside another C++ program?

I will sketch the scenario I would like to get working below.
I have one main application.
That application, based on user interactions, can load other applications inside a secure environment/shell. This means these child applications cannot interact with the OS anymore, nor with each other.
The parent program can at any time call functions of these child programs.
The child program can at any time call functions of these parent programs.
Does anyone know how to implement this in C++? Preferably both parent and child should be written in C++.
The performance of loading the child applications inside the parent application doesn't matter. The only thing that matters is the performance of the communication between child and parent when calling functions of each other.
You will have to write your own compiler.
Consider: No normal OS supports what you want. You want both executables to run inside a single process, yet that process may or may not make OS calls depending on some weirdness inside the process which the OS doesn't understand at all.
This is no longer a problem with your custom compiler, as it simply will not create the offending instructions. It's similar to Java and .Net, which also prevent such OS calls outside their control.
A portable solution: Google Native Client
One possible Linux solution:
Make AppArmor profile with "hats" (a "hat" is a sandboxing configuration to which the application can switch programmatically with libapparmor),
have the main application create a "pipe",
have the main application "fork",
change into a "hat" corresponding to the child application,
"exec" the child application,
the main application and the child application communicate via the "pipe" created earlier.
If you want a (semi)crossplatform way to do this you can use RPC to call functions in another process. It's going to work on anything that supports the distributed computing environment. It's been around for some time and the msdn documentation states that parts of windows use it for inter process communication so it's probably fast enough. Here's a tutorial on msdn that should get you up and running http://msdn.microsoft.com/en-us/library/windows/desktop/aa379010.aspx The bad part is that I haven't been able to find a tutorial about using it on linux.
If you don't want to use RPC or find it too hard to find good documentation on the subject, you can use the standard IPC(Inter Process Communication) mechanisms from unix systems to signal your process that should call a certain function. I'd recommend a message queue because it's very fast and lightweight. You can find a tutorial here: http://www.cs.cf.ac.uk/Dave/C/node25.html
I am not familiar with OS restrictions in above answers. However, I found an easy way to solve this problem. I hope it helps and does not have a technical issue. I used Linux OS. Suppose I want to call C++ program B inside another C++ program A. I wrote a perl script (such as PerlScript.pl) that contains a system call to run program B. Then in A, I did a system call like system("perl PerlScript.pl") that ask perl to run B for me.

Simple but fast IPC method for a Python and C++ application?

I have a GNU Radio application which utilizes both Python and C++ code. I want to be able to signal the C++ code of an event. If they were in the same scope I would normally use a simple boolean, but the code is separate to the point where some form of shared memory is required. The code in question is performance-critical so an efficient method is required.
I was initially thinking about a shared memory segment that is accessible by both Python and C++. Therefore I could set a flag in the python code and check it from C++. Since I just need a simple flag to pause the C++ code, would a semaphore suffice?
To be clear, I need to set a flag from Python and the C++ code will simply check this flag, and if it is set enter a busy loop.
So would trying to implement a shared memory segment between Python/C++ be a reasonable approach? How about a semaphore? On Linux, which is easier to implement?
Thanks!
Assuming this is two separate applications on one machine and you need decent real time performance you don't want to go with sockets. I would use a flag in shared memory, and probably use a semaphore to make sure both programs can't be accessing the flag at once. This library provides access to the semaphores and shared memory with Python and supports Python versions 2.4-3.1 (not 3.0): http://semanchuk.com/philip/posix_ipc
EDIT: Changed recommendation to using a semaphore protecting the flag in shared memory
Why not open a unix socket? Or use DBus
If Boost is an option, you could use Boost.Python and Boost.Interprocess. Boost.Python gives you a way for Python & C++ objects to interact and Boost.Interprocess gives you plenty of options for shared memory or synchronization primitives across process boundaries.
DBus looks promising. It supports signals, so you should be able to stop an application on demand. However, I'm not sure if it's performance will be enough for you.
You can try using custom signals. I don't know about Python code being able to send custom signals, but your C/C++ can certainly define custom signals with SIGIO.
If you have stringent response-time requirements, you might need to look beyond your application code and into some time of OS with support for real-time signals (rt-linux, muOs, etc.)

C++: Most common way to talk to one application from the other one

In bare outlines, I've got an application which looks through the directories at startup and creates special files' index - after that it works like daemon. The other application creates such 'special' files and places them in some directory. What way of informing the first application about a new file (to index it) is the most common, simple (the first one is run-time, so it shouldn't slow it too much), and cross-platform if it is possible?
I've looked through RPC and IPC but they are too heavy (also non-cross-platform and slow (need a lot of features to work - I need a simple light well-working way), probably).
Pipes would be one option: see Network Programming with Pipes and Remote Procedure Calls (Windows) or Creating Pipes in C (Unix).
I haven't done this in a while but from my experience with RPC, DCOM, COM, .NET Remoting, and socket programming, I think pipes is the most straightforward and efficient option.
For windows (NTFS) you can get notification from OS that directory was changed. But it is not crosspl. and not about two apps.
"IPC but them are too heavy" - no no, they are not heavy at all. You should look at named pipes - this IPC is fastest and it is in both Win/Unix-like with slight differences. Or sockets!
eisbaw suggested TCP. I'd say, to make it even more simple, use UDP.
Create a listening thread that will receive packets, and handle it from there - on all applications.
Since it is on the same PC you'll never lose any packet, something that UDP could mistakenly do when on network.
Each application instance will need a special port but this is easy to configure with configuration files that you (I assume) already have.
Keep it simple (:
Local TCP sockets are guarenteed to work - as already mentioned by Andrey
Shared memory would be another option, take a look at
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html
As Andrey noted, if you agree on the full path ahead of time, you can just have the OS tell you when it's added. All major platforms actually support this in some form. You can use a cross-platform library for this, such as QFileSystemWatcher.
EDIT: I don't think QFileSystemWatcher will cause too much of a performance hit. It definitely relies on the underlying OS for notifications on Linux, FreeBSD, and Mac OS (and I think Windows). See http://qtnode.net/wiki/QFileSystemWatcher
memory mapped files, socket, and named pipes are all highly efficient, cross platform, ipc mechanisms. Well, the apis to access named pipes and memory mapped files differ between POSIX and Win32, but the basic mechanisims are similar enough that its easy to make a cross platform wrapper. Sockets and named pipes tend to be fast because, in inter-process situations, the OS developers (of most common OSs) have built in shortcuts that essentially makes the socket / named pipe write a rather simple wrap of a memory section.

System() calls in C++ and their roles in programming

I've often heard that using system("PAUSE") is bad practice and to use std::cin.get() instead. Now my understanding of system calls is that they take a string which they enter into a system command line and talk with the OS, so PAUSE is a DOS command that pauses the output in the command window. I assume this works similarly with Mac and unix with different keywords, and using system calls is discouraged because of a lack of cross OS compatibility. (If I'm wrong with any of this, please correct me)
my question is this: When is it appropriate to use system() calls? How should they be applied? When should they NOT be applied?
system("PAUSE") is certainly less than ideal. using a call to system creates a subprocess, which on windows is fairly expensive and in any case not terribly cheap on any operating system. On embedded systems the memory overhead is significant.
If there is any way to do it without much pain natively then do it. In the case of waiting for the user to press a single button, cin.get() will be very hard to beat. In this case, your applications process will just block on stdin, setting only a few flags visible to the kernel, and most importantly, allocates no new memory and creates no new scheduling entities, not even an interrupt handler.
Additionally, it will work the same on all operating systems with all c++ compilers, since it uses only a very basic feature of a very standard part of the language, rather than depend on anything the OS provides.
EDIT: predicting your concern that it doesn't matter if it's expensive because the whole idea is to pause. Well, first off, if its expensive, then it's going to hurt performance for anything else that might be going on. Ever notice (on windows) when one application is launching, other, already open apps become less responsive too? Additionally, your user might not be a live human, but rather another program working on behalf of a human user (Say, a shell script). The script already knows what to do next and can pre-fill stdin with a character to skip over the wait. If you have used a subprocess here, the script will experience a (noticeable to a human) delay. If the script is doing this hundreds (or hundreds of millions!) of times, a script that could take seconds to run now takes days or years.
EDIT2: when to use system(): when you need to do something that another process does, that you can't do easily. system() isn't always the best candidate because it does two things that are somewhat limiting. First, the only way to communicate with the subprocess is by command line arguments as input and return value as output. The second is that the parent process blocks until the child process has completed. These two factors limit the cases in which system is useable.
on unixy systems, most subprocesses happen with fork because it allows the same program to continue in the same place as two separate processes, one as a child of the other (which is hardly noticeable unless you ask for it from the OS). On Linux, this is especially well optimized, and about as cheap as creating a pthread. Even on systems where this is not as fast, it is still very useful (as demonstrated by the apache process-pool methodology) (unavailable on windows/link to unix docs)
other cases (on windows too!) are often handled by popen or exec family of functions. popen creates a subprocess and a brand new pipe connecting to the subprocesses' stdin or stdout. Both parent and child processes can then run concurrently and communicate quite easily. (link to windows docs/link to unix docs)
exec* family of functions (there are several, execl, execv and so on) on the other hand causes the current program to be replaced by the new program. The original program exits invisibly and the new process takes over. When then new process returns, it will return to whatever called the original process, as if that process had returned at that point instead of vanishing. The advantage of this over exit(system("command")) is that no new process is created, saving time and memory (though not always terribly much) (link to windows docs /link to unix docs)
system could plausibly be used by some scripted tool to invoke several steps in some recipe action. For example, at a certain point, a program could use system to invoke a text editor to edit some configuration file. It need not concern itself too much with what happens, but it should certainly wait until the user has saved and closed the editor before continuing. It can then use the return value to find out if the editing session was successful, in the sense that the editor actually opened the requested file (and that the editor itself existed at all!), but will read the actual results of the session from the edited file directly, rather than communicating with the subprocess. (link to windows docs/link to unix docs)
System calls are sent to the shell or command line interpreter of the OS (dos, bash, etc) and its up to the shell to do what it wants with this command.
You would avoid using these kind of calls as it would reduce your programs portability to work with other operating systems. I would think only when you are absolutely sure that your code is targeting a specific OS that you should use such calls.
But my question is this: When is it appropriate to use system() calls? How should they be applied?
When you can't do the thing you're trying to do with your own code or a library (or the cost of implementing it outweighs the cost of launching a new process to do so). system() is pretty costly in terms of system resources compared to cin.get(), and as such it should only be used when absolutely necessary. Remember that system() typically launches both an entire new shell and whatever program you asked it to run, so thats two new executables being launched.
By the way, system() call should never be used with binaries with SUID or SGID bit set, quoting from the man page:
Do not use system() from a program with set-user-ID or set-group-ID
privileges, because strange values for some environment variables
might be used to subvert system integrity. Use the exec(3) family of
functions instead, but not execlp(3) or execvp(3). system() will not,
in fact, work properly from programs with set-user-ID or set-group-ID
privileges on systems on which /bin/sh is bash version 2, since bash 2
drops privileges on startup.
system() is used to ask the operating system to run a program.
Why would your program want the operating system to run a program? Well there are cases. Sometimes an external program or operating system command can perform a task that is hard to do in your own program. For example, an external program may operate with elevated privileges or access propriety data formats.
The system() function, itself, is fairly portable but the command string you pass it is likely to be very platform-specific -- though the command string can be pulled from local configuration data to make it more platform-agnostic.
Other functions like fork(), exec*(), spawn*() and CreateProcess() will give you much more control over the way you run the external program, but are platform-specific and may not be available on your platform of choice.
system("PAUSE") is an old DOS trick and is generally considered to be fairly grotty style these days.
As far as i know system("PAUSE") is a windows only thing, and that is why it is frowned upon.